rjones (Tue, 13 Feb 2018 18:45:38 GMT):
User User_1 added by rjones.

rjones (Tue, 13 Feb 2018 18:45:48 GMT):
User User_1 removed by rjones.

rjones (Tue, 13 Feb 2018 18:47:05 GMT):
User User_2 added by rjones.

rjones (Tue, 13 Feb 2018 18:47:15 GMT):
Dan

rjones (Tue, 13 Feb 2018 18:47:21 GMT):
Dan

rjones (Tue, 13 Feb 2018 18:47:51 GMT):
User User_3 added by rjones.

rjones (Tue, 13 Feb 2018 18:47:57 GMT):
User User_4 added by rjones.

rjones (Tue, 13 Feb 2018 18:48:03 GMT):
User User_5 added by rjones.

rjones (Tue, 13 Feb 2018 18:48:09 GMT):
User User_6 added by rjones.

rjones (Tue, 13 Feb 2018 18:48:15 GMT):
User User_7 added by rjones.

rjones (Tue, 13 Feb 2018 18:48:22 GMT):
User User_8 added by rjones.

rjones (Tue, 13 Feb 2018 18:48:28 GMT):
User User_9 added by rjones.

rjones (Tue, 13 Feb 2018 18:48:35 GMT):
User User_10 added by rjones.

rjones (Tue, 13 Feb 2018 18:48:42 GMT):
User User_11 added by rjones.

rjones (Tue, 13 Feb 2018 18:48:48 GMT):
User User_12 added by rjones.

rjones (Tue, 13 Feb 2018 18:48:54 GMT):
User User_13 added by rjones.

rjones (Tue, 13 Feb 2018 18:49:00 GMT):
User User_14 added by rjones.

rjones (Tue, 13 Feb 2018 18:49:05 GMT):
User User_15 added by rjones.

rjones (Tue, 13 Feb 2018 18:49:11 GMT):
User User_16 added by rjones.

rjones (Tue, 13 Feb 2018 18:49:18 GMT):
User User_17 added by rjones.

rjones (Tue, 13 Feb 2018 18:49:26 GMT):
User User_18 added by rjones.

rjones (Tue, 13 Feb 2018 18:49:35 GMT):
zac

rjones (Tue, 13 Feb 2018 18:49:36 GMT):
zac

rjones (Tue, 13 Feb 2018 18:49:41 GMT):
pschwarz

rjones (Tue, 13 Feb 2018 18:49:42 GMT):
pschwarz

rjones (Tue, 13 Feb 2018 18:49:45 GMT):
grkvlt

rjones (Tue, 13 Feb 2018 18:49:46 GMT):
grkvlt

rjones (Tue, 13 Feb 2018 18:49:50 GMT):
dplumb

rjones (Tue, 13 Feb 2018 18:49:50 GMT):
dplumb

rjones (Tue, 13 Feb 2018 18:49:55 GMT):
agunde

rjones (Tue, 13 Feb 2018 18:49:55 GMT):
agunde

rjones (Tue, 13 Feb 2018 18:50:00 GMT):
amundson

rjones (Tue, 13 Feb 2018 18:50:00 GMT):
amundson

rjones (Tue, 13 Feb 2018 18:50:05 GMT):
cianx

rjones (Tue, 13 Feb 2018 18:50:07 GMT):
cianx

rjones (Tue, 13 Feb 2018 18:50:12 GMT):
TomBarnes

rjones (Tue, 13 Feb 2018 18:50:12 GMT):
TomBarnes

rjones (Tue, 13 Feb 2018 18:50:28 GMT):
jsmitchell

rjones (Tue, 13 Feb 2018 18:50:28 GMT):
jsmitchell

rjones (Tue, 13 Feb 2018 18:50:32 GMT):
drozd

rjones (Tue, 13 Feb 2018 18:50:33 GMT):
drozd

rjones (Tue, 13 Feb 2018 18:50:36 GMT):
MicBowman

rjones (Tue, 13 Feb 2018 18:50:37 GMT):
MicBowman

rjones (Tue, 13 Feb 2018 18:50:40 GMT):
RyanBanks

rjones (Tue, 13 Feb 2018 18:50:41 GMT):
RyanBanks

rjones (Tue, 13 Feb 2018 18:50:43 GMT):
achenette

rjones (Tue, 13 Feb 2018 18:50:44 GMT):
achenette

rjones (Tue, 13 Feb 2018 18:50:46 GMT):
adamludvik

rjones (Tue, 13 Feb 2018 18:50:47 GMT):
adamludvik

rjones (Tue, 13 Feb 2018 18:50:50 GMT):
askmish

rjones (Tue, 13 Feb 2018 18:50:50 GMT):
askmish

rjones (Tue, 13 Feb 2018 18:50:53 GMT):
boydjohnson

rjones (Tue, 13 Feb 2018 18:50:55 GMT):
boydjohnson

rjones (Tue, 13 Feb 2018 18:52:52 GMT):
@Dan you are the channel owner, everyone else is a channel moderator, feel free to hand out ownership as you like. I _think_ moderators (which everyone else is) can invite an unmute people?

Dan (Tue, 13 Feb 2018 18:53:36 GMT):
cool. Thanks!

rjones (Tue, 13 Feb 2018 18:54:57 GMT):
To everyone else: you should be able to speak in here, anyone else that joins will be muted. I think when you unmute someone they may speak, but they may need to become a moderator. If you have any issues helpdesk@hyperledger.org :)

amundson (Tue, 13 Feb 2018 18:59:54 GMT):
I'm not a huge fan of limiting who can talk, and if we have a very liberal policy about who can talk then the channel name isn't great.

amundson (Tue, 13 Feb 2018 19:00:19 GMT):
that said, looking forward to more public discussions

amundson (Tue, 13 Feb 2018 19:05:27 GMT):
@grkvlt, silas, and I were chatting in the seth channel about priorities for seth - would be good for everyone who has thoughts to dump some comments there.

rjones (Tue, 13 Feb 2018 19:09:57 GMT):
@amundson the request was to set up an analog to the #fabric-maintainers channel.

Dan (Tue, 13 Feb 2018 19:38:36 GMT):
yep. that's what I asked for. not to restrict discussion, of course, but to provide a channel where we can discuss forward progress on sawtooth separately from trouble shooting questions more commonly discussed on #sawtooth

Dan (Tue, 13 Feb 2018 19:40:07 GMT):
User User_19 added by Dan.

Dan (Wed, 14 Feb 2018 19:00:46 GMT):
Room name changed to: sawtooth-core-dev by Dan

Dan (Wed, 14 Feb 2018 19:01:13 GMT):
changed name per amundson's comment.

Dan (Wed, 14 Feb 2018 19:02:32 GMT):
Sawtooth core development discussions

Dan (Wed, 14 Feb 2018 19:03:42 GMT):
#sawtooth tends to facilitate sawtooth troubleshooting and application development. this channel is focused on discussions germane to core developers.

Dan (Wed, 14 Feb 2018 19:04:45 GMT):
#sawtooth tends to facilitate sawtooth troubleshooting and application development. this channel is focused on discussions internal implementation details.

Dan (Wed, 14 Feb 2018 19:44:59 GMT):
We've added @grkvlt as a core contributor / seth maintainer! _merge button activated_

grkvlt (Wed, 14 Feb 2018 19:46:11 GMT):
thanks guys, appreciate it - hope i can contribute some useful stuff!

amundson (Wed, 14 Feb 2018 19:52:40 GMT):
thanks for the room rename @Dan

ColinAlstad (Fri, 16 Feb 2018 17:11:07 GMT):
Has joined the channel.

akeelnazir (Sat, 17 Feb 2018 13:06:04 GMT):
Has joined the channel.

aczire (Tue, 20 Feb 2018 20:06:24 GMT):
Has joined the channel.

aczire (Tue, 20 Feb 2018 20:07:03 GMT):
Has left the channel.

tomislav (Tue, 20 Feb 2018 23:09:11 GMT):
Has joined the channel.

vCloudernBeer (Wed, 21 Feb 2018 02:06:24 GMT):
Has joined the channel.

amundson (Wed, 21 Feb 2018 17:41:02 GMT):
I propose that we adopt an RFC process similar to the one used by Rust, documented here - https://github.com/rust-lang/rfcs

amundson (Wed, 21 Feb 2018 17:42:10 GMT):
Unless there are objections, I'd like to create this repo while the hackfest is still going on and draft our rules (based off of those in the Rust README.md).

tkuhrt (Wed, 21 Feb 2018 18:36:47 GMT):
Has joined the channel.

kdenhartog (Wed, 21 Feb 2018 18:47:22 GMT):
Has joined the channel.

nage (Wed, 21 Feb 2018 18:47:24 GMT):
Has joined the channel.

pschwarz (Wed, 21 Feb 2018 18:49:32 GMT):
That sounds good to me

grkvlt (Wed, 21 Feb 2018 19:33:29 GMT):
so, i just fixed an issue in the validator which I need to also fix in 1.0, so I made two pull requests, one against `master` and one against `1-0`, since there isn't really any change. is this the best way to do things?

grkvlt (Wed, 21 Feb 2018 19:33:29 GMT):
i just fixed an issue in the validator which I need to also fix in 1.0, so I made two pull requests, one against `master` and one against `1-0`, since there isn't really any change. is this the best way to do things?

grkvlt (Wed, 21 Feb 2018 19:46:08 GMT):
also, looks like my PRs aren't getting built any more? see https://github.com/hyperledger/sawtooth-core/pull/1461

grkvlt (Wed, 21 Feb 2018 19:47:02 GMT):
e.g. https://build.sawtooth.me/job/Sawtooth-Hyperledger/job/sawtooth-core/job/fix%252Fget-block-txn-handler/1/console

grkvlt (Wed, 21 Feb 2018 19:48:26 GMT):
but https://build.sawtooth.me/job/Sawtooth-Hyperledger/job/sawtooth-core/job/PR-1461/1/ is fine...

Dan (Wed, 21 Feb 2018 20:14:02 GMT):
I see two issues: 1) git is hard for iterating on documents. 2) we also have jira as a source of record for features

Dan (Wed, 21 Feb 2018 20:16:30 GMT):
@grkvlt looks like that's building. maybe you manually kicked it off or someone else did?

grkvlt (Wed, 21 Feb 2018 20:19:02 GMT):
There's multiple builds, the one that fails is `continuous-integration/jenkins/branch` but `continuous-integration/jenkins/pr-head` and `continuous-integration/jenkins/pr-merge` seem to work. Looks like Jenkins can't work out the GitHub user for the one that fails?

grkvlt (Wed, 21 Feb 2018 20:19:50 GMT):
```[Pipeline] readTrusted Obtained bin/whitelist from 2335d3aa64375ea9fe1d65334cef2984f13cd927 [Pipeline] readTrusted Obtained COMMITTERS from 2335d3aa64375ea9fe1d65334cef2984f13cd927 [Pipeline] sh [jenkins-Sawtooth-Hyperledger-sawtooth-core-fix%2Fget-block-txn-handler-1] Running shell script + ./bin/whitelist COMMITTERS USAGE: ./bin/whitelist [user] [whitelist] ```

pschwarz (Wed, 21 Feb 2018 21:00:11 GMT):
Jira is not great for collaborating on documents, either

Dan (Wed, 21 Feb 2018 22:41:23 GMT):
User User_20 added by Dan.

Dan (Wed, 21 Feb 2018 22:42:12 GMT):
@TreyZhong https://sawtooth.hyperledger.org/docs/core/releases/latest/app_developers_guide.html

TreyZhong (Wed, 21 Feb 2018 22:42:13 GMT):
Has joined the channel.

amundson (Wed, 21 Feb 2018 23:17:56 GMT):
@grkvlt looks like a jenkins or jenkinsfile issue we will have to dive into

amundson (Wed, 21 Feb 2018 23:27:44 GMT):
@grkvlt as far as 1.0 backports, @Dan @pschwarz @agunde and I were just talking about backport strategy while at the hackfest

amundson (Wed, 21 Feb 2018 23:28:49 GMT):
The general approach is going to bundle up the backports into a single PR, do a LR7 (long-running 7-day) test on it before merging it to 1-0. the idea is that then 1-0 is releasable at any given point.

amundson (Wed, 21 Feb 2018 23:29:55 GMT):
@Dan and I are the release maintainers for 1.0 and we both need to approve backports

amundson (Wed, 21 Feb 2018 23:30:29 GMT):
@pschwarz has a release of backports targeted for 1.0.2, and I assume he will add your fix to that

amundson (Wed, 21 Feb 2018 23:31:17 GMT):
we will write this up / revise it going forward. we are trying to reconcile what we would normally do (just open PRs against both branches) with our desire for long-running testing.

amundson (Wed, 21 Feb 2018 23:32:06 GMT):
we are targeting Monday as the cut-off for 1.0.2 backports

Dan (Wed, 21 Feb 2018 23:47:27 GMT):
I looked more at the rfc link amundson posted. I see what he wants to do and where this fits wrt jira. I'm good with all that. I will be sad if we don't call them `slap`s but it will be easier for people to understand / find if they are called rfcs.

amundson (Thu, 22 Feb 2018 00:00:44 GMT):
ok, I'll request sawtooth-rfcs

grkvlt (Thu, 22 Feb 2018 00:20:12 GMT):
@amundson understood re: 1.0 thanks, so i'll only merge the `master` version of that fix and leave PR 1462 for the 1.0.2 release

pschwarz (Thu, 22 Feb 2018 00:25:50 GMT):
@grkvlt I will probably close it, since I will cherry pick the change from master - an earlier change will also be cherry picked that would cause a conflict

grkvlt (Thu, 22 Feb 2018 00:27:15 GMT):
@pschwarz note that that will cause the error you picked up on (thanks for spotting it) since the thread pool object name changes from `thread_pool` to `client_thread_pool` in `master`

pschwarz (Thu, 22 Feb 2018 00:27:48 GMT):
Right, that `client_thread_pool` will also be cherry picked

pschwarz (Thu, 22 Feb 2018 00:28:00 GMT):
That's what I meant

grkvlt (Thu, 22 Feb 2018 00:28:01 GMT):
+!

Dan (Thu, 22 Feb 2018 01:34:44 GMT):
kelly_

Dan (Thu, 22 Feb 2018 02:02:27 GMT):
silasdavis

Gabisan (Thu, 22 Feb 2018 12:03:08 GMT):
Has joined the channel.

grkvlt (Thu, 22 Feb 2018 15:17:35 GMT):
Guys, I'm thinking that the _Copyright 2017 Intel_ should maybe be _Copyright 2017-2018 the Hyperledger Foundation_ or not there at all? For example, my changes certainly aren't copyrighted by Intel...

amundson (Thu, 22 Feb 2018 17:08:11 GMT):
@grkvlt There is no copyright assignment as far as I know; if you add substantial changes, it is probably appropriate to add an additional Copyright line at the top of the file. For example: https://github.com/Cargill/sawtooth-supply-chain/commit/36ed7da0647016bd6f98a6b643bccfd0b3c1a769#diff-8ec4e10999f55039260973f374ce05c3R1

amundson (Thu, 22 Feb 2018 17:20:51 GMT):
@TomBarnes @Dan ^ any additional thoughs/feedback on copyright header approach?

TomBarnes (Thu, 22 Feb 2018 17:26:31 GMT):
Not sure - will have to seek guidance.

amundson (Thu, 22 Feb 2018 17:29:53 GMT):
@rjones who is the best person at Hyperledger to discuss things like copyright headers, etc.?

rjones (Thu, 22 Feb 2018 17:32:05 GMT):
Mike Dolan, probably. It is a sticky issue. For instance, we took all date references out of our headers for AllSeen Alliance.

rjones (Thu, 22 Feb 2018 17:32:56 GMT):
@amundson : escalate through Todd @tbenzies

tbenzies (Thu, 22 Feb 2018 17:32:56 GMT):
Has joined the channel.

amundson (Thu, 22 Feb 2018 17:33:42 GMT):
ok, thanks. I assume we can look at other projects for reference too.

amundson (Thu, 22 Feb 2018 17:34:09 GMT):
let's summarize anything we learn here, please so we all see it

TomBarnes (Thu, 22 Feb 2018 17:34:30 GMT):
Took a quick look at Fabric - files are marked "Copyright IBM" - so same model there.

rjones (Thu, 22 Feb 2018 17:34:30 GMT):
I'm not 100% on the way IP assignation works for Hyper ledger

TomBarnes (Thu, 22 Feb 2018 17:35:21 GMT):
More specifically "Copyright IBM Corp. 2016 All Rights Reserved."

rjones (Thu, 22 Feb 2018 17:35:23 GMT):
If you have to add a specific copyright for every person that touches a file, I can't see that scaling.

amundson (Thu, 22 Feb 2018 17:35:59 GMT):
I've seen "Substantial changes" as a rule in other non-hyperledger projects

TomBarnes (Thu, 22 Feb 2018 17:36:37 GMT):
Took a quick look at Apache Spark - no evidence of copyright claims there - just the Apache license. https://github.com/apache/spark

amundson (Thu, 22 Feb 2018 17:38:39 GMT):
looks like @tbenzies is in this room and isn't muted

TomBarnes (Thu, 22 Feb 2018 17:40:02 GMT):
This is the guidance from the Apache foundation: http://www.apache.org/legal/src-headers.html#headers

grkvlt (Thu, 22 Feb 2018 18:19:03 GMT):
I'd go with tthis as @TomBarnes suggests - it seems to work well in the Apache projects I contribute to. So we just remve copyright altogether, but keep Apache-2.0 license header?

kyogesh91 (Thu, 22 Feb 2018 20:29:00 GMT):
Has joined the channel.

Dan (Fri, 23 Feb 2018 03:38:33 GMT):
if linux foundation doesn't have guidance then we should do the apache guidance.

Dan (Fri, 23 Feb 2018 03:44:26 GMT):
btw, I think we have an unwritten policy on merges.. need to get it written, but preference is for `rebase and merge` avoids merge commits cluttering history Other thing that hasn't made it to email yet is backports to 1.0 branch require normal +2 approval and approval from @amundson and @dan as 1.0 release maintainers. If you add any dependencies add @TomBarnes too. At the hackfest we discussed using jenkins artifacts from the PR in an LR test to vet the changes - this seems like a good way to continue our 1.0 KPIs in point releases.

Dan (Fri, 23 Feb 2018 04:19:07 GMT):
@agunde is the rust sdk versioned as 0.1.0 because it is incomplete and long term intent is to link version to sawtooth-core?

agunde (Fri, 23 Feb 2018 14:49:19 GMT):
Good question. That version was put picked 7 months ago when the Rust SDK did not have transaction processor support. As we move closer to finalizing the API and doing other clean ups, the version should be updated. @amundson thoughts?

TomBarnes (Fri, 23 Feb 2018 16:25:47 GMT):
@grkvlt hi - to clarify, per the Apache guidelines, we remove the copyright from the individual files and move it to the NOTICE file.

Dan (Fri, 23 Feb 2018 16:49:43 GMT):
I took this discussion to #tsc https://chat.hyperledger.org/channel/tsc?msg=rT6r8wEehRRExxYmx

amundson (Fri, 23 Feb 2018 18:37:26 GMT):
I'm not sure Cargill will be on board with no copyright header, at least without some discussion

amundson (Fri, 23 Feb 2018 18:38:43 GMT):
@agunde @Dan I would be comfortable versioning it with the same version as the rest of core, once we decide on _ vs. -

amundson (Fri, 23 Feb 2018 18:39:01 GMT):
or no (_-)sdk

pmettu (Mon, 26 Feb 2018 22:48:07 GMT):
Has joined the channel.

VikasJakhar (Tue, 27 Feb 2018 20:05:16 GMT):
Has joined the channel.

grkvlt (Thu, 01 Mar 2018 16:17:16 GMT):
idea for future development: a `seth-workload` cli tool similar to current `smallbank-workload` and also new intkey tool being proposed?

amundson (Thu, 01 Mar 2018 18:17:37 GMT):
that makes sense to me

amundson (Thu, 01 Mar 2018 23:22:35 GMT):
head on over to #sawtooth-ci for Jenkins/Build/CI discussion

formax (Fri, 02 Mar 2018 23:17:15 GMT):
Has joined the channel.

cuevrob (Mon, 05 Mar 2018 17:01:12 GMT):
Has joined the channel.

rjones (Tue, 06 Mar 2018 01:54:56 GMT):
Has left the channel.

amundson (Tue, 06 Mar 2018 16:07:38 GMT):
We are creating a Sawtooth RFC process, which will serve as a framework for design discussion.

amundson (Tue, 06 Mar 2018 16:07:58 GMT):
I can't add any more reviewers to the PR (github has a 15 person limit)

amundson (Tue, 06 Mar 2018 16:08:16 GMT):
however, everyone should review this PR which includes the process: https://github.com/hyperledger/sawtooth-rfcs/pull/3

mikezaccardo (Tue, 06 Mar 2018 20:14:20 GMT):
Has joined the channel.

MicBowman (Wed, 07 Mar 2018 17:56:32 GMT):
@Dan not sure if this question is for you... in the poet enclave code, how do you know that the incoming sealed data is really sealed data structure? are you assuming that the decryption will fail? The question is really about the validity of the conversion from uint_8* to sgx_sealed_data_t

vishwasbalakrishna (Wed, 07 Mar 2018 21:35:02 GMT):
Has joined the channel.

Dan (Thu, 08 Mar 2018 14:23:29 GMT):
Yeah I don't recall if my hands were in that or not, but that looks to be the case.. assume decryption will fail, and in case that was a bad assumption then do some sanity checks on what gets decrypted

amundson (Thu, 08 Mar 2018 18:27:40 GMT):
this repo is now live - https://github.com/hyperledger/sawtooth-rfcs

amundson (Thu, 08 Mar 2018 18:28:10 GMT):
I hope we enjoy the process more than we hate it. :)

grkvlt (Thu, 08 Mar 2018 19:06:41 GMT):
thx, @amundson - it looks useful for managing features going forward. i'm assuming `core_changes.md` will be created with core sub-team specific notes, which can be copied and modified for e.g. seth, hyper directory, explorer, et al

amundson (Thu, 08 Mar 2018 19:41:50 GMT):
yeah. for core, we will focus on things like API / protocol compatibility requrements, etc.

adamludvik (Thu, 08 Mar 2018 19:42:56 GMT):
@amundson with respect to creating the hyperledger/sawtooth-sdk-go repo, do we want to keep the commit history from sawtooth-core by cloning, or start from fresh commits?

amundson (Thu, 08 Mar 2018 19:46:59 GMT):
we should attempt to keep history to the extent possible

amundson (Thu, 08 Mar 2018 19:47:38 GMT):
if we felt the repo is too big and wanted to purge, we could purge everything except the go sdk from the commit history

adamludvik (Thu, 08 Mar 2018 20:37:18 GMT):
That is what I was thinking. Clone the repo and then do a purge commit to get it to a state similar to: https://github.com/rberg2/sawtooth-go-sdk

Gandalf (Fri, 09 Mar 2018 03:53:36 GMT):
Has joined the channel.

grkvlt (Fri, 09 Mar 2018 18:46:48 GMT):
so, using something like `git filter-branch --tree-filter "find . -not -path './sdk/go'"` to remove history?

grkvlt (Fri, 09 Mar 2018 18:46:48 GMT):
so, using something like `git filter-branch --tree-filter "find . -not -path './sdk/go'"` to remove unwanted history?

adamludvik (Mon, 12 Mar 2018 14:09:55 GMT):
It will be similar to what we did with seth. I think I did it manually, but an automated option would be fine. This would be a one time thing and then the new repo would track new history from that point on.

smgulley (Mon, 12 Mar 2018 14:30:11 GMT):
Has joined the channel.

rjones (Tue, 13 Mar 2018 17:07:27 GMT):
Has joined the channel.

rjones (Tue, 13 Mar 2018 17:08:45 GMT):
https://github.com/hyperledger/sawtooth-hyper-directory/network/dependencies @dan @rbuysse @rberg2 @amundson

rjones (Tue, 13 Mar 2018 17:08:45 GMT):
https://github.com/hyperledger/sawtooth-hyper-directory/network/dependencies @Dan @rbuysse @rberg2 @amundson

rbuysse (Tue, 13 Mar 2018 17:08:45 GMT):
Has joined the channel.

rberg2 (Tue, 13 Mar 2018 17:08:45 GMT):
Has joined the channel.

amundson (Wed, 14 Mar 2018 17:28:02 GMT):
That code was contributed by Chris Spanton. Does anyone know if he is on Rocket Chat? @cianx @boydjohnson @dplumb?

boydjohnson (Wed, 14 Mar 2018 17:28:28 GMT):
I haven't seen him on rocket chat?

boydjohnson (Wed, 14 Mar 2018 17:28:28 GMT):
I haven't seen him on rocket chat.

rjones (Wed, 14 Mar 2018 18:27:54 GMT):
Has left the channel.

amundson (Wed, 14 Mar 2018 21:05:39 GMT):
First Cargill contribution made it to a PR - https://github.com/hyperledger/sawtooth-supply-chain/pull/40

Dan (Wed, 14 Mar 2018 21:41:52 GMT):
Awesome! _(there's a typo in the PR message you should fix though @amundson)_ :-D

amundson (Wed, 14 Mar 2018 21:42:33 GMT):
heh

MicBowman (Wed, 14 Mar 2018 21:53:27 GMT):
@amundson do you know why the python ecdsa library was dropped?

MicBowman (Wed, 14 Mar 2018 21:53:39 GMT):
was it a licensing issue?

MicBowman (Wed, 14 Mar 2018 21:53:45 GMT):
(it appears to be MIT)

Dan (Wed, 14 Mar 2018 21:56:47 GMT):
check with @TomBarnes. I think it might have been a whitelisting issue.

MicBowman (Wed, 14 Mar 2018 21:57:40 GMT):
ok... it looks like its MIT license

MicBowman (Wed, 14 Mar 2018 21:58:35 GMT):
we are trying to make a lightweight client for 17.10... and secp256 doesn't compile with python 3.6

amundson (Wed, 14 Mar 2018 23:12:27 GMT):
I didn't know about the secp256/python 3.6 issue, we will have to investigate that

Dan (Thu, 15 Mar 2018 14:13:33 GMT):
just became aware there's an infrastructure channel: https://chat.hyperledger.org/channel/infra-support

amundson (Thu, 15 Mar 2018 15:05:15 GMT):
I'm considering writing up an RFC for adding partial validation compatibility - basically storing sub-tree merkle hashes in the block so that a portion of the tree can be verified independently using only transactions which modify that part of the tree. @Dan - we could use this for validator registry, so it can be validated early in consensus checks without processing all other transactions

Dan (Thu, 15 Mar 2018 20:21:03 GMT):
Not sure I understand that. The issue that poet runs into is that a registration isn't committed before nodes want to start publishing.

Dan (Thu, 15 Mar 2018 20:21:28 GMT):
@amundson can you clarify your feedback on https://github.com/hyperledger/sawtooth-core/pull/1501?

rjones (Fri, 16 Mar 2018 23:24:28 GMT):
Has joined the channel.

rjones (Fri, 16 Mar 2018 23:25:25 GMT):
I realize in advance that this is the wrong forum. Could whoever is in control of https://github.com/sawtooth-build account please accept the invites it has pending? it would make my life easier.

rjones (Fri, 16 Mar 2018 23:27:54 GMT):
I think the owner needs to visit https://github.com/hyperledger/ when authenticated as that user to see the invites.

amolk (Sat, 17 Mar 2018 04:14:22 GMT):
Has joined the channel.

pankajgoyal (Sat, 17 Mar 2018 05:11:32 GMT):
Has joined the channel.

Dan (Mon, 19 Mar 2018 15:56:43 GMT):
@pankajgoyal you may need to rebase your PR 1501 and re-push it. Build is failing probably due to some changes in go.

matthewehoward (Mon, 19 Mar 2018 18:12:51 GMT):
Has joined the channel.

Dan (Tue, 20 Mar 2018 03:23:20 GMT):
@adamludvik @askmish regarding https://jira.hyperledger.org/browse/STL-1113, the poet fork resolver seemed to have a contract with the BlockValidator that it would never be asked to look at a non-poet block. However the BlockValidator relies on the current chain's consensus for resolving forks without regard to what consensus the candidate block may have used. The exact assignment of which consensus is current seems to have evolved around 4-5 months ago. Where I left off was this commit that indicates no changes were made other than splitting the class into its own file. https://github.com/hyperledger/sawtooth-core/commit/c07b4645ce63b994c3d1528f55f9e5c64b849f2e. To resolve the bug it would be great to get the logs @pschwarz and probably to decide on the contract between consensus and block validator @adamludvik & @askmish.

Dan (Tue, 20 Mar 2018 14:21:43 GMT):
@pankajgoyal #1501 is blocking release 1.0.2. Kindly prioritize the rebase please.

Dan (Tue, 20 Mar 2018 18:43:52 GMT):
@pankajgoyal thanks for the rebase. I've prodded jenkins to rebuild the PR. It built cleanly now and I have merged #1501 @pschwarz .

pschwarz (Tue, 20 Mar 2018 18:44:50 GMT):
Create a back port PR for it

Dan (Tue, 20 Mar 2018 19:17:36 GMT):
https://github.com/hyperledger/sawtooth-core/pull/1531

Dan (Tue, 20 Mar 2018 19:17:41 GMT):
@TomBarnes ^

rjones (Tue, 20 Mar 2018 19:19:27 GMT):
@Dan I don't know if you've noticed, but you can create and add groups to reviews. It's pretty nice.

Dan (Tue, 20 Mar 2018 19:20:34 GMT):
tell me more!

rjones (Tue, 20 Mar 2018 19:20:56 GMT):
I think you have permissions to make subgroups of: https://github.com/orgs/hyperledger/teams/sawtooth-core-contributors and then you can add the group in the review area

rjones (Tue, 20 Mar 2018 19:21:31 GMT):

sawtooth.png

rjones (Tue, 20 Mar 2018 19:21:45 GMT):
you can add any of those groups - some of which are probably a little broad :)

rjones (Tue, 20 Mar 2018 19:23:46 GMT):
if I'm wrong, and you can't make a sub team, tell me what ones you want and I'll add them and make you a maintainer. Then you can add what accounts you like

rjones (Tue, 20 Mar 2018 19:24:17 GMT):
if you go here: https://github.com/orgs/hyperledger/teams/sawtooth-core-contributors/teams are you able to add a team?

rjones (Tue, 20 Mar 2018 19:26:04 GMT):
heh my question is answered :)

Dan (Tue, 20 Mar 2018 19:26:04 GMT):
Cool. Yes I could create a team. I will use this for great evil. >:D

rjones (Tue, 20 Mar 2018 19:26:10 GMT):
mazel tov!

rjones (Tue, 20 Mar 2018 19:26:32 GMT):
with great powers come varying degrees of responsibilities ;)

rjones (Tue, 20 Mar 2018 19:27:59 GMT):
also... who owns https://github.com/sawtooth-build ? is it in use? could you talk them into accepting my invites? they have alike a dozen outstanding

Dan (Tue, 20 Mar 2018 19:34:10 GMT):
I don't know who that is. Could have been something one of us did a long time ago? I have no recollection ... :beers: :cocktail: ... of many things.

rjones (Tue, 20 Mar 2018 19:35:50 GMT):
I'll remove it and see if anyone complains

grkvlt (Tue, 20 Mar 2018 19:49:28 GMT):
from the 'if it was really important then i wouldn't be able to delete it in the first place' school of devops...

rjones (Tue, 20 Mar 2018 19:55:05 GMT):
@grkvlt I set it to `read` access from `write` access. If nobody complains in a few weeks, I'll remove it

Dan (Tue, 20 Mar 2018 20:04:00 GMT):
@ryanbeck ^ just in case sawtooth-build is something you know about

ryanbeck (Tue, 20 Mar 2018 20:04:00 GMT):
Has joined the channel.

ShikarSharma (Tue, 20 Mar 2018 22:56:18 GMT):
Has joined the channel.

rbuysse (Wed, 21 Mar 2018 17:14:04 GMT):
Has left the channel.

rbuysse (Wed, 21 Mar 2018 17:14:11 GMT):
Has joined the channel.

adamludvik (Wed, 21 Mar 2018 17:39:07 GMT):
@amundson @pschwarz @jsmitchell Have you seen this? https://blog.ethereum.org/2015/06/26/state-tree-pruning/

rbuysse (Wed, 21 Mar 2018 18:03:13 GMT):
Has left the channel.

rbuysse (Wed, 21 Mar 2018 18:03:31 GMT):
Has joined the channel.

amundson (Wed, 21 Mar 2018 18:20:14 GMT):
@adamludvik quite similar to our discussions

pschwarz (Wed, 21 Mar 2018 18:25:49 GMT):
That look interesting, will give it a read

Dan (Wed, 21 Mar 2018 18:31:43 GMT):
rbuysse

rbuysse (Wed, 21 Mar 2018 18:33:41 GMT):
Has left the channel.

rbuysse (Wed, 21 Mar 2018 18:33:52 GMT):
Has joined the channel.

adamludvik (Wed, 21 Mar 2018 18:41:46 GMT):
Does everyone agree that in order to do run-time state-pruning correctly, there needs to be some form of reference counting at each node in the merkle tree? The model described in that ethereum blog post uses reference counting and drops state after some number of blocks.

Dan (Wed, 21 Mar 2018 18:46:53 GMT):
@MicBowman did you have some state pruning logic around the v0.4 timeframe?

rnagler (Wed, 21 Mar 2018 19:12:45 GMT):
Has joined the channel.

MicBowman (Wed, 21 Mar 2018 23:38:45 GMT):
@dan sorry for the slow response... yes

MicBowman (Wed, 21 Mar 2018 23:38:59 GMT):
or at least state compression

amundson (Thu, 22 Mar 2018 00:18:04 GMT):
@MicBowman can you elaborate on compression vs. pruning?

rjones (Thu, 22 Mar 2018 19:04:20 GMT):
Has left the channel.

adamludvik (Thu, 22 Mar 2018 20:19:38 GMT):
The Sawtooth Governance Model RFC has entered the final comment period and can be found here: https://github.com/hyperledger/sawtooth-rfcs/pull/6. @amundson @pschwarz @Dan @agunde @jsmitchell please confirm your approval of the RFC. Once the RFC is merged, we can create the initial sub teams.

adamludvik (Thu, 22 Mar 2018 20:19:54 GMT):
or maybe most of you already did...

kelly_ (Thu, 22 Mar 2018 23:18:24 GMT):
@adamludvik I'd like to see broader representation on the product/user, compliance, and research side in terms of core team

kelly_ (Thu, 22 Mar 2018 23:19:09 GMT):
I for one have contributed no code, but the verbiage says "The root team includes stakeholders who are actively involved in the Sawtooth community and have expertise within the project."

kelly_ (Thu, 22 Mar 2018 23:19:29 GMT):
I think @TomBarnes would be another valuable add as well

kelly_ (Thu, 22 Mar 2018 23:20:31 GMT):
"Steering the project toward specific use cases where Sawtooth can have a major impact"

kelly_ (Thu, 22 Mar 2018 23:21:05 GMT):
wrt my role with Sawtooth I think that I could be useful here, and bringing a customer/ecosystem perspective

kelly_ (Thu, 22 Mar 2018 23:22:12 GMT):
will add comments to the Github

TomBarnes (Thu, 22 Mar 2018 23:23:21 GMT):
I think the core team would be better served by having a balanced representation of key stakeholders. I would like to suggest that we revise it include Shawn, James, and Peter from Bitwise, and Dan, Tom, and Kelly from Intel.

TomBarnes (Thu, 22 Mar 2018 23:24:52 GMT):
I also think the initial core sub-team membership should be documented as either part of the governance PR, or one immediately following, so that it is clearly articulated to ourselves and the community.

kelly_ (Thu, 22 Mar 2018 23:27:24 GMT):
apologize if I am late in commenting, this was my first time seeing this document

TomBarnes (Thu, 22 Mar 2018 23:29:14 GMT):
I apologize for not raising this concern in yesterdays discussion - I did express concern about technical merit being the sole determinant for inclusion in teams, but was unable to more clearly articulate it.

amundson (Fri, 23 Mar 2018 00:50:24 GMT):
@kelly_ sorry for the confusion. seems like we have more discussions to have. the intent was sub-teams largely do the voting on RFCs, with a broad involvement there (including all the examples you gave, localized to those sub-teams). root team was picked from developers involved in the current code base's construction and had a lead role in architecture/design and day-to-day development. for an open source project, those seemed like fair criteria for initial selection. the bar was set very high, and it thus excludes a fair number of developers that work on Sawtooth every day.

kelly_ (Fri, 23 Mar 2018 01:12:59 GMT):
yep understood @amundson. i'd still like to be considered for the high level team, I haven't seen the sub teams yet so will look into that

kelly_ (Fri, 23 Mar 2018 01:13:05 GMT):
oops @amundson

kelly_ (Fri, 23 Mar 2018 01:13:48 GMT):
w.r.t open source project i think it's fair to consider non-development because there is a lot more involved than just the development piece, e.g. evangelism, customer engagement, funding, etc.

kelly_ (Fri, 23 Mar 2018 01:14:31 GMT):
also understand that there are some regular developers that have been excluded

kelly_ (Fri, 23 Mar 2018 01:15:15 GMT):
I think @TomBarnes makes a valid point with it being 'represented by key stakeholders'

kelly_ (Fri, 23 Mar 2018 01:15:40 GMT):
also ideally would liek to the core maintainers to expand beyond intel and bitwise, and think folks like @grkvlt are on that path

grkvlt (Fri, 23 Mar 2018 01:16:05 GMT):
Seems that this is like the PMC in an Apache project? as opposed to the people who have a commit bit...

kelly_ (Fri, 23 Mar 2018 01:17:01 GMT):
@grkvlt yep, that is similar to what i was thinking

grkvlt (Fri, 23 Mar 2018 01:17:16 GMT):
And PMC is a strict subset of Committers, normally

grkvlt (Fri, 23 Mar 2018 01:17:32 GMT):
OK

kelly_ (Fri, 23 Mar 2018 01:20:07 GMT):
oh, that was not my understanding

kelly_ (Fri, 23 Mar 2018 01:20:11 GMT):
in fact the opposite

kelly_ (Fri, 23 Mar 2018 01:20:24 GMT):
The role of the PMC from a Foundation perspective is oversight. The main role of the PMC is not code and not coding - but to ensure that all legal issues are addressed, that procedure is followed, and that each and every release is the product of the community as a whole. That is key to our litigation protection mechanisms.

kelly_ (Fri, 23 Mar 2018 01:20:32 GMT):
so that is probably a little to 'legal' oriented

kelly_ (Fri, 23 Mar 2018 01:20:52 GMT):
too*

kelly_ (Fri, 23 Mar 2018 01:23:05 GMT):
So I was thinking more about the role, vs the selection criteria

grkvlt (Fri, 23 Mar 2018 01:23:54 GMT):
TBH, in Apache that's the way it is as well, practically speaking. The fact that PMC members have commit rights means they either founded the project or used to be an active developer, they may no longer be writing code but they still are active managing the project

kelly_ (Fri, 23 Mar 2018 01:24:12 GMT):
yep exactly

grkvlt (Fri, 23 Mar 2018 01:25:27 GMT):
Of course if PMC members want to write code too, that's even better!

kelly_ (Fri, 23 Mar 2018 01:28:47 GMT):
Yea, I mean to be fully transparent, I have an issue with not having a voice on setting the direction and values of the project. I've been instrumental in the founding of Hyperledger, getting Sawtooth brought into it, obtaining funding for the majority of Sawtooth development (which has been done both internally and externally), bringing users and developers into the project, and evangelizing for it. I'd like to also think I've had some influence into the technical direction in the beginning of the project, and at a minimum helping to drive requirements and technical direction

kelly_ (Fri, 23 Mar 2018 01:29:52 GMT):
So I recognize that this is open source and code counts, but I think it's a bit myopic to only give those directly writing code a say

kelly_ (Fri, 23 Mar 2018 01:31:49 GMT):
at the highest level I would say clearly Intel and Bitwise have the most invested in the development of Sawtooth and I'd like to continue to see more equitable 'joint-ownership' if you will

kelly_ (Fri, 23 Mar 2018 01:34:58 GMT):
My off the cuff thought on maintainers is that the split between 'product' and 'developer' should be relative to the needs of what they are maintaing

kelly_ (Fri, 23 Mar 2018 01:35:14 GMT):
so a community outreach subteam may have 1 developer and 5 evangelists

kelly_ (Fri, 23 Mar 2018 01:35:26 GMT):
where sawtooth core team may be 4 developers and 2 product people

kelly_ (Fri, 23 Mar 2018 01:36:35 GMT):
and I think the overall Sawtooth project likely needs to include (at some point), developers, architects, product, marketing/design, and compliance/legal

grkvlt (Fri, 23 Mar 2018 01:38:08 GMT):
One thing that worries (?) me a little bit about this RFC and governance model is that it seems to be Sawtooth specific, in that it had to be created from an existing Rust project document. I'd have thought that the Hyperledger Foundation itself would have provided the model and structure for the projects it incubates. So, Fabric, Sawtooth, Iroha, whatever, all have the same structure and policies. This is based on my experience as a PMC member and committer at multiple ASF projects, where Apache provides a lot of structure and guidance. Not sure about CNCF, they may be more free-form, like Hyperledger.

grkvlt (Fri, 23 Mar 2018 01:38:08 GMT):
One thing that worries (?) me a little bit about this RFC and governance model is that it seems to be Sawtooth specific, in that it had to be created from an existing Rust project document. I'd have thought that the Hyperledger Foundation itself would have provided the model and structure for the projects it incubates. So, should Fabric, Sawtooth, Iroha, whatever, not all have the same structure and policies.?This is based on my experience as a PMC member and committer at multiple ASF projects, where Apache provides a lot of structure and guidance. Not sure about CNCF, they may be more free-form, like Hyperledger.

kelly_ (Fri, 23 Mar 2018 01:38:34 GMT):
there is some overall LF governance

kelly_ (Fri, 23 Mar 2018 01:38:51 GMT):
but it doesnt discuss project specifics (e.g. +2 for a merge)

grkvlt (Fri, 23 Mar 2018 01:40:01 GMT):
Maybe there should be foundation wide standards, and the Sawtooth decisions could become the template, then?

kelly_ (Fri, 23 Mar 2018 01:41:31 GMT):
this is about the extent i've seen from Fabric - https://hyperledger-fabric.readthedocs.io/en/release-1.1/CONTRIBUTING.html#maintainers

kelly_ (Fri, 23 Mar 2018 01:42:07 GMT):
which does differ from the proposed RFC

kelly_ (Fri, 23 Mar 2018 01:42:15 GMT):
e.g. majority rule for adding a maintainer vs unanimous

grkvlt (Fri, 23 Mar 2018 01:46:02 GMT):
Yeah, I prefer the Apache model which is that a proposal passes if there is at least one PMC +1 vote and no -1 votes.

grkvlt (Fri, 23 Mar 2018 01:47:31 GMT):
Since the PMC will have members who may not be able to give a binding +1 due to insufficient knowledge in the specific area, so unanimous agreement is not always possible

grkvlt (Fri, 23 Mar 2018 01:48:21 GMT):
Especially true if the PMC is made up of non-active developers, like you propose, which I think is the right decision

kelly_ (Fri, 23 Mar 2018 01:50:56 GMT):
@grkvlt you clearly have a lot of experience on this, any feedback on the RFC would be appreciated

kelly_ (Fri, 23 Mar 2018 01:51:22 GMT):
@amundson let's chat tomorrow, i'm about to head out for dinner

grkvlt (Fri, 23 Mar 2018 14:03:23 GMT):
I added some comments to the PR. In particular I think the sub-teams should be project based, for things that have a deliverable artifact like Seth or the Go SDK, not cross-cutting concerns like CI or release management, and also that the reviewer/committer/maintainer split is too complex, and could be replaced with maintainer only.This is basically the way ASF PMCs and permissions work, which is what I'm familiar with...

amundson (Fri, 23 Mar 2018 14:04:12 GMT):
@grkvlt I don't think defining this cross-project at the HL level would be productive, though other projects have and are welcome to follow our lead.

grkvlt (Fri, 23 Mar 2018 14:04:54 GMT):
sure, i think get it working right with sawtooth before suggesting it as the one true Hyperledger way ;)

amundson (Fri, 23 Mar 2018 14:08:22 GMT):
I think it is important to consider that the projects all have different people and operate in very different ways currently. As a HL whole we couldn't even all agree to use github; and that's fine, as long as we allow projects to do what works for them (and not against them).

amundson (Fri, 23 Mar 2018 14:21:15 GMT):
The reason we have sub-teams as we do in that list of examples is very intentionally not at the repository level. As one project, we need to make sure we work together as a whole. SDKs for example. We have some principles on SDKs currently and not all SDKs adhere to them. Those SDKs need work prior to being considered complete, having a stable API, or being mature. The SDK sub-team would vote on RFCs that would define those criteria or proposals to change the SDKs (which should mostly all have the same features and feel). If we wanted to add SDK code to handle batch submissions to the REST API, the SDK team should consider that across SDKs. The SDK sub-team therefore should be comprised from maintainers across SDKs so that we both get good representation for those decisions.

amundson (Fri, 23 Mar 2018 14:22:20 GMT):
It is also important to point out that while that sounds like only the sub-team is involved, that is absolutely not the case. The intent is that the sub-team is where we drive consensus but that the discussion is open to participation by anyone.

amundson (Fri, 23 Mar 2018 14:33:07 GMT):
As currently written, being on a sub-team does not grant you maintainership at the repo level. I think at the repo level, it is very important that maintainers know the code intimately. So we wouldn't expect someone on the consensus sub-team who has not committed anything to PoET to be approving PoET PRs. That is best left to those that intimately know that code base.

cheetara (Fri, 23 Mar 2018 14:33:20 GMT):
Has joined the channel.

grkvlt (Fri, 23 Mar 2018 14:42:00 GMT):
ok, sure, but it seems like a lot of overhead in terms of managing users and rights.

grkvlt (Fri, 23 Mar 2018 14:42:40 GMT):
also, not suggesting a 1-1 mapping of teams to repos, more like 1-many, where eg sdk team manages several repos, for each language sdk

grkvlt (Fri, 23 Mar 2018 14:43:39 GMT):
also, at some point you just have to trust the developers...

grkvlt (Fri, 23 Mar 2018 14:46:06 GMT):
the thing that appealed to me about sawtooth and seth was how i was able to get involved without too much in the way of barriers to entry, that's a really good way of getting people to help with your project, and imo you don't want to lose that

grkvlt (Fri, 23 Mar 2018 14:48:12 GMT):
it can be quite intimidating coming to a new open source project and not being sure if you're doing the right thing, following the rules properly etc. the ASF helps because i know all projects there have a similar management structure, and familiarity with one is good for the rest. fortuntely everyone here has been really helpful and encouraging, so as long as we keep that up, great

adamludvik (Fri, 23 Mar 2018 14:55:35 GMT):
@grkvlt @kelly_ really good thoughts and feedback

adamludvik (Fri, 23 Mar 2018 15:06:27 GMT):
Especially about keeping the community open, helpful, and encouraging. I think the goal is to make transparent much of the existing structure within the project so that we can continue to grow as a community in a healthy way.

amundson (Fri, 23 Mar 2018 15:30:03 GMT):
@grkvlt the levels reviewer/committer/maintainer is there because it maps well to github (though we didn't want to call out the mechanics in the RFC). there is a desire to let contributors review PRs even if they don't contribute code (thus read-only github perms and 'reviewer' group); to be able to allow trusted folks to click merge ('committer' without reaching the high bar of maintainer and then maintainers being the approvers.

amundson (Fri, 23 Mar 2018 15:30:32 GMT):
it is not intended to be complex, but it does make more explicit some things that are just implied today

grkvlt (Fri, 23 Mar 2018 15:31:10 GMT):
i guess you can use github teams/groups to separate roles?

amundson (Fri, 23 Mar 2018 15:33:23 GMT):
so, as an example, once poet spins off, maybe I get committer rights but I'm not a poet maintainer. I can review, create PRs, and merge, but maintainers of PoET that are working on that code and know it well are the maintainers (and thus their approval is required to merge).

amundson (Fri, 23 Mar 2018 15:34:51 GMT):
yes, we have the ability to create groups with the roles in github. I suspect 'reviewer' is one large list across the project, maybe 'committer' is project specific or project-wide and then maintainer groups be managed more specifically to the repo (and matching MAINTAINER.md we add to the repos).

amundson (Fri, 23 Mar 2018 15:36:32 GMT):
the reviewer level is going to help a lot when folks are just joining the project because the threshold can be low. we can also add folks from other HL projects or with just a passing interesting so they can help with specific PR reviews.

amundson (Fri, 23 Mar 2018 15:37:07 GMT):
(I find it super annoying when you can't add someone as a reviewer)

grkvlt (Fri, 23 Mar 2018 16:02:38 GMT):
ok, that makes more sense, especially if it makes it easier to bring people in to the project quickly at a low level - this should be a very light touch process, then more formal when we want to give actual write access

yoni (Tue, 27 Mar 2018 13:29:14 GMT):
Has joined the channel.

MicBowman (Tue, 27 Mar 2018 22:05:32 GMT):
@amundson the apt repo fails for python 3.6... is there a reason for the restriction?

amundson (Tue, 27 Mar 2018 22:44:35 GMT):
There is no restriction, but it is an Ubuntu 16.04 repo

amundson (Tue, 27 Mar 2018 22:45:17 GMT):
(no restriction other than it's all compiled against 16.04 stuff)

rjones (Tue, 27 Mar 2018 23:46:08 GMT):
Has joined the channel.

MicBowman (Tue, 27 Mar 2018 23:51:22 GMT):
do you know what doesn't work? trying to run on 17.10... as far as i can tell, everything should run just fine

MicBowman (Tue, 27 Mar 2018 23:51:58 GMT):
we've been albe to install libsecp256k1 by installing the stock library then the python

kelly_ (Wed, 28 Mar 2018 01:32:08 GMT):
@yoni since your question involves changes to the Sawtooth SDK it is probably better asked in this channel as there is less noise

kelly_ (Wed, 28 Mar 2018 01:32:25 GMT):
copying the question over here

kelly_ (Wed, 28 Mar 2018 01:32:27 GMT):
"second try, your inputs will be much appreciated :) Working on private ledger design and need to re-validate the header signature from within the C++ transaction processor (decrypt signature with public key and make sure it matches hash of transaction header). This require a change to sawtooth SDK since currently the transaction object that comes into TP in the apply method is a class that wraps the protobuf header and this class prevents access to the serialized header bytes. There are 2 ways we thought about on how to make this change in the SDK and would like to open a discussion here on which approach is better. Option 1: add getter to the txn wrapping class that will re-serialize the protobuf transaction header and return it. Option 2: add flag when register transaction processor that will state that this TP should receive transaction as protobuf serialized object and not with the wrapper class. I have tested option 1 locally and it worked fine (changed sawtooth_sdk.h and transaction_handler.h)"

kelly_ (Wed, 28 Mar 2018 01:33:46 GMT):
I think @amundson is the current expert on SDKs

kelly_ (Wed, 28 Mar 2018 01:35:53 GMT):
I think @EugeneYYY is on the c++ sdk but others probably have an opinion on how they would prefer it structured

EugeneYYY (Wed, 28 Mar 2018 01:35:53 GMT):
Has joined the channel.

kelly_ (Wed, 28 Mar 2018 01:36:42 GMT):
which I think is @zac @pschwarz @boydjohnson

kelly_ (Wed, 28 Mar 2018 01:36:52 GMT):
and @agunde too

amundson (Wed, 28 Mar 2018 01:58:18 GMT):
@MicBowman where I would start on that is getting the dependency debs to compile on 17.10, then compile sawtooth on 17.10. I've been meaning to take a look but haven't found the time. sawtooth-core/bin/build_deps builds the dependencies.

amundson (Wed, 28 Mar 2018 01:58:30 GMT):
Ryan and I have been working on a new build methodology that decreases the amount of custom scripts (RFCs for this is nearly finished, and we have some prototypes done), but it is probably realistically a month out yet. (My only point is don't get too aggressive redoing the build since something better is coming.)

amundson (Wed, 28 Mar 2018 01:59:07 GMT):
sorry, the name of that script is bin/build_ext_debs

amundson (Wed, 28 Mar 2018 02:00:15 GMT):
there is a comment at the top of the file on how to run it with "-t ubuntu:xenial"

amundson (Wed, 28 Mar 2018 02:00:36 GMT):
(so you will want to change that to the 17.10 release name)

amundson (Wed, 28 Mar 2018 02:02:02 GMT):
if we get those compiling, we can add them to a 17.10 repo on repo.sawtooth.me

amundson (Wed, 28 Mar 2018 02:06:49 GMT):
after that, to get sawtooth debs built, we will need to adjust bin/build_all to use a 17.10 docker image. you can look in there, and then realize why we are redoing the build system. anyway, the relevant function in that file is build_debs(), which gets called if you do 'bin/build_all debs' (I think, I didn't just try that). It references ci/sawtooth-build-debs, which is the docker image we need to make 17.10. sawtooth-core/ci/sawtooth-build-debs also has some docs at the top on how to run it directly but usually it is run through bin/build_all.

amundson (Wed, 28 Mar 2018 02:07:44 GMT):
once we know what modifications we need to ci/sawtooth-build-debs, then it should be straightforward to put it together and start producing debs for both 16.04 and 17.10 when we do releases, etc.

amundson (Wed, 28 Mar 2018 02:12:43 GMT):
@yoni @kelly_ this is maybe not the right channel. it is too locked down to be useful for such discussions. but I'll answer it here anyway since I didn't see it in the other channel.

amundson (Wed, 28 Mar 2018 02:22:23 GMT):
This was discussed (last week?) in a previous engineering call. This change will require an RFC. Option 1 will not be considered, since we have a fundamental and strict rule against re-serializing transaction/batch/block bytes as it introduces an unnecessary point for indeterminism. However, you could prototype your work doing that approach by forking the SDK. There are two other options. Option 2, as you describe, might be implemented by adding a field to TpRegisterRequest called 'raw_process_request' (a bool), and then either a) adding a new message TpProcessRawRequest (and maybe TpPRocessRawResponse) which is the same as TpProcessRequest but with header as bytes; or b) adding an additional field to TpProcessRequest called 'header_raw' or 'raw_header' which is used instead of 'header' if raw_process_request was set during registration.

amundson (Wed, 28 Mar 2018 02:24:06 GMT):
Option 3 would be adding additional TP_* methods to retrieve the raw transaction data specifically, similar to state requests. This option is very inefficient.

amundson (Wed, 28 Mar 2018 02:25:36 GMT):
I think I'm preferring Option 2b, though we need to fully explore the impact this may have on backward compatibility (I don't expect unsolvable problems).

amundson (Wed, 28 Mar 2018 02:27:10 GMT):
That would cover changes to the validator. Then we would need to determine how this should be presented to the user in the various SDKs. Ideally, it would be invisible because most TPs won't need this feature.

amundson (Wed, 28 Mar 2018 02:28:56 GMT):
Backward compatibility is the largest constraint. We will support all existing TP implementations as they are today (for the 'stable/mature' SDKs anyway) without requiring modification, and that includes using old SDKs to interface with the validator. I don't think there are problems here that can't be solved but it is a topic the RFC should cover.

amundson (Wed, 28 Mar 2018 02:33:52 GMT):
I think if we had an example in python and C++, then we could ask the other SDK maintainers for samples for the other SDKs

rjones (Wed, 28 Mar 2018 03:53:26 GMT):
May I rename this channel to #sawtooth-maintainers ?

amundson (Wed, 28 Mar 2018 12:40:40 GMT):
@rjones let's not rename it yet; we should consider it with some other channels but we haven't decided what we want them to be just yet

pschwarz (Wed, 28 Mar 2018 14:38:48 GMT):
Created the branch `1-0-staging-00` for backports. Please create backport PR’s against this branch. It will be used for running regression tests.

MicBowman (Wed, 28 Mar 2018 15:08:27 GMT):
thanks, @amundson

kelly_ (Wed, 28 Mar 2018 15:10:27 GMT):
@amundson what about a sawtooth-sdk channel?

kelly_ (Wed, 28 Mar 2018 15:10:58 GMT):
just trying to find a spot where @yoni can have a discussion with sdk owners that isnt so crowded with the getting started folks in teh general sawtooth channel

amundson (Wed, 28 Mar 2018 16:10:10 GMT):
that will probably be one of them, but we should hold off so they match sub-teams as we create them

amundson (Wed, 28 Mar 2018 16:10:59 GMT):
chat channels being too busy is a nice problem to have :)

MicBowman (Wed, 28 Mar 2018 16:28:16 GMT):
how do you want to refer to the sawtooth burrow evm in the smart contract paper?

MicBowman (Wed, 28 Mar 2018 16:28:41 GMT):
i believe it is used as Hyperledger Sawtooth Burrow-EVM

amundson (Wed, 28 Mar 2018 17:47:01 GMT):
Seth, Sawtooth Seth, Hyperledger Sawtooth Seth

rjones (Fri, 30 Mar 2018 02:25:52 GMT):
rbuysse

rjones (Fri, 30 Mar 2018 02:29:31 GMT):
jsmitchell

rjones (Fri, 30 Mar 2018 02:29:36 GMT):
jsmitchell

jsmitchell (Fri, 30 Mar 2018 02:40:22 GMT):
Test

rjones (Fri, 30 Mar 2018 02:40:34 GMT):
hey!

rbuysse (Fri, 30 Mar 2018 03:33:01 GMT):
:thumbsup:

victer (Fri, 30 Mar 2018 11:39:28 GMT):
Has joined the channel.

kelly_ (Mon, 02 Apr 2018 21:13:43 GMT):
@rjones can we rename sawtooth-hyperdirectory to sawtooth-next-directory per the branding guidence from greg wallace?

rjones (Mon, 02 Apr 2018 21:15:24 GMT):
On GitHub? Please send email to helpdesk@hyperledger.org

kelly_ (Mon, 02 Apr 2018 21:17:02 GMT):
will do, thanks!

rjones (Tue, 03 Apr 2018 00:08:49 GMT):
@rbuysse @rberg2 @amundson @Dan do you see the alert here? https://github.com/hyperledger/sawtooth-next-directory

TomBarnes (Tue, 03 Apr 2018 00:12:26 GMT):
Hi, Ry - Tom barnes here - i do not see any alert when i navigate to https://github.com/hyperledger/sawtooth-next-directory

rjones (Tue, 03 Apr 2018 00:15:22 GMT):
@TomBarnes you aren't an admin of sawtooth on github

TomBarnes (Tue, 03 Apr 2018 00:15:40 GMT):
no i dont think i am

rjones (Tue, 03 Apr 2018 00:17:03 GMT):
right, I'm saying, you aren't. the issue is in one of your dependencies having a CVE

TomBarnes (Tue, 03 Apr 2018 00:18:24 GMT):
sawtooth-core of sawtooth-next-directory?

TomBarnes (Tue, 03 Apr 2018 00:18:24 GMT):
sawtooth-core or sawtooth-next-directory?

rjones (Tue, 03 Apr 2018 00:19:26 GMT):
this group: https://github.com/orgs/hyperledger/teams/sawtooth-core-admins

TomBarnes (Tue, 03 Apr 2018 00:20:22 GMT):
i guess i'll have to leave it to the admins

rjones (Tue, 03 Apr 2018 15:18:41 GMT):
rberg2

rjones (Tue, 03 Apr 2018 15:18:44 GMT):
rberg2

amundson (Wed, 04 Apr 2018 16:22:50 GMT):
@rjones @kelly_ is following up with some of next-directory devs

amundson (Wed, 04 Apr 2018 16:28:42 GMT):
early access to some RFCs:

amundson (Wed, 04 Apr 2018 16:28:50 GMT):
https://github.com/Cargill/sawtooth-rfcs/blob/c003-supply-chain-expand-data-types/text/0000-supply-chain-expand-data-types.md

amundson (Wed, 04 Apr 2018 16:29:08 GMT):
https://github.com/Cargill/sawtooth-rfcs/blob/c004-supply-chain-universal-client/text/0000-supply-chain-universal-client.md

amundson (Wed, 04 Apr 2018 16:29:23 GMT):
https://github.com/Cargill/sawtooth-rfcs/blob/c005-supply-chain-property-references/text/0000-supply-chain-property-references.md

amundson (Wed, 04 Apr 2018 16:29:36 GMT):
https://github.com/Cargill/sawtooth-rfcs/blob/c007-supply-chain-client-sdk/text/0000-supply-chain-client-sdk.md

peakcodes (Fri, 06 Apr 2018 21:35:29 GMT):
Has joined the channel.

grkvlt (Sat, 07 Apr 2018 18:15:09 GMT):
what's the status of https://github.com/hyperledger/sawtooth-sdk-go currently? it looks like it's a copy of the current sawtooth-core, is that correct? it'd be nice if we used `git filter-branch` to create the repo this time, as it gives better history

grkvlt (Sat, 07 Apr 2018 18:45:00 GMT):
see https://github.com/grkvlt/tmp-hyperledger-sawtooth-go-sdk/commits/master for an example of what i mean

grkvlt (Sat, 07 Apr 2018 18:45:00 GMT):
see https://github.com/grkvlt/tmp-hyperledger-sawtooth-go-sdk/commits/master for an example of what i mean (note only 70 commits, the ones relevant to the go SDK)

sv2011 (Sat, 07 Apr 2018 23:11:32 GMT):
Has joined the channel.

Anton 202 (Mon, 09 Apr 2018 04:46:06 GMT):
Has joined the channel.

adamludvik (Mon, 09 Apr 2018 14:33:37 GMT):
@grkvlt, @dplumb and @rberg2 are working on it.

adamludvik (Mon, 09 Apr 2018 14:35:15 GMT):
I'm not familiar with using `git filter-branch`. There is a PR that deletes a bunch and moves a bunch around to make it look like github.com/rberg2/sawtooth-sdk-go

grkvlt (Mon, 09 Apr 2018 14:35:18 GMT):
ok. fyi, the commands i used were: `git filter-branch --prune-empty --subdirectory-filter sdk/go master && git mv src/sawtooth_sdk/* . && rm -rf src`

grkvlt (Mon, 09 Apr 2018 14:35:55 GMT):
that does what we want, and only keeps commits that touch `sdk/go`

grkvlt (Mon, 09 Apr 2018 14:36:11 GMT):
which is better, i think

adamludvik (Mon, 09 Apr 2018 14:36:43 GMT):
@amundson have you used `git filter-branch` before? Is that a better solution than keeping all history?

grkvlt (Mon, 09 Apr 2018 14:37:21 GMT):
@amundson see my repo above for what it would look like

rberg2 (Mon, 09 Apr 2018 14:45:38 GMT):
my latest attempts are here https://github.com/rberg2/sawtooth-sdk-go I am still having some issues getting the tests to pass due to import paths, and I need to figure out for to build the mocs I think.

rberg2 (Mon, 09 Apr 2018 14:45:38 GMT):
my latest attempts are here https://github.com/rberg2/sawtooth-sdk-go I am still having some issues getting the tests to pass due to import paths, and I need to figure out how to build the mocs I think.

grkvlt (Mon, 09 Apr 2018 14:55:07 GMT):
@rberg2 yeah, you would be better using the `filter-branch` command to create the repo, so you don't end up with ~6k commits that are not relevant. you can then cherry pick the last few commits from your repo on top to get the equivalent. i'd be happy to set it all up for you, if you want? it'd take half an hour...

amundson (Mon, 09 Apr 2018 15:06:50 GMT):
@adamludvik yes, I have used it for that purpose in the past

amundson (Mon, 09 Apr 2018 15:07:54 GMT):
I think that if we don't need to edit past commits (which we have sign-offs on), we shouldn't do it.

amundson (Mon, 09 Apr 2018 15:08:35 GMT):
that would be my only concern with it though, like I said, I've used filter-branch a lot in the past

amundson (Mon, 09 Apr 2018 15:09:47 GMT):
if we were bad people and checked in a lot of jar files in history, certainly we would need to prune

grkvlt (Mon, 09 Apr 2018 15:11:18 GMT):
if there's a legal reason for keeping the commits intact, of course, but i'm not 100% sure there would be? the developers have already agreed that their code can be modified, which is all we are doing. i think it's a standard practice for splitting repositories, and it gives a neater history imo...

amundson (Mon, 09 Apr 2018 15:16:11 GMT):
Do we have a list of who the maintainers of the go sdk are?

amundson (Mon, 09 Apr 2018 15:17:31 GMT):
If so, probably makes sense to see if they have consensus on this topic

amundson (Mon, 09 Apr 2018 15:18:34 GMT):
I could see an argument to be made that even given it's small size currently, the repo size is still important to reduce because the repo itself being used within the dev workflow

amundson (Mon, 09 Apr 2018 15:20:13 GMT):
we would probably take the same approach for the other SDKs potentially as well. Java, Javascript, etc.

amundson (Mon, 09 Apr 2018 15:23:15 GMT):
@rjones would anyone at LF want to weigh in on this discussion (whether to modify commits when splitting repos)?

adamludvik (Mon, 09 Apr 2018 15:27:30 GMT):
We don't have a go sdk maintainer list yet, so a decision would be deferred to the core maintainers list for now. I am fine with either method.

Dan (Mon, 09 Apr 2018 15:44:10 GMT):
I say filter. Less is more.

rjones (Mon, 09 Apr 2018 16:26:00 GMT):
@amundson I don't remember who I was talking to about this - I was surprised you weren't using `filter-branch`. My _feeling_ about this is the codebase in total is what you need to obey the license for, since each commit is part of the codebase. The upside to `filter-branch` is commands like `git bisect` will still let you debug usefully. Having a thousand extra commits is going to make your life worse.

rjones (Mon, 09 Apr 2018 16:26:09 GMT):
s/you/they/

rjones (Mon, 09 Apr 2018 16:28:18 GMT):
@amundson @Dan given the canonical way to import code into a project under LF aegis is a squash commit...

amundson (Mon, 09 Apr 2018 16:30:04 GMT):
I almost threw up :)

amundson (Mon, 09 Apr 2018 16:30:26 GMT):
(re: squash commit)

amundson (Mon, 09 Apr 2018 16:30:56 GMT):
seems like fairly strong consensus brewing so far, if no one brings an alternate position

fedotovcorp (Tue, 10 Apr 2018 10:15:36 GMT):
Has joined the channel.

amundson (Tue, 10 Apr 2018 15:47:27 GMT):
RFC preview: https://github.com/Cargill/sawtooth-rfcs/blob/c006-docker-compose-builds/text/0000-docker-compose-builds.md

kelly_ (Tue, 10 Apr 2018 16:19:02 GMT):
@amundson you mentioned a sawtooth-website. there are some collateral I want to add in the next month or so on the marketing/community side which would include things like 1) whitepaper 2) logos 3) press-kit (one pager, PDF overview) etc.

kelly_ (Tue, 10 Apr 2018 16:19:44 GMT):
do you think st-website is the appropriate place for those even if they dont neccesarily get exposed via the website

kelly_ (Tue, 10 Apr 2018 16:19:53 GMT):
though most of them could easily/should be via link

amundson (Tue, 10 Apr 2018 16:50:13 GMT):
probably, yes

amundson (Tue, 10 Apr 2018 16:52:52 GMT):
I'd like to get the website repo initialized later in the week or next week

kelly_ (Tue, 10 Apr 2018 17:17:26 GMT):
ok works for me

pschwarz (Tue, 10 Apr 2018 18:10:44 GMT):
RFC Preview: https://github.com/peterschwarz/sawtooth-rfcs/blob/state-pruning-change-log/text/0000-state-pruning.md

kelly_ (Tue, 10 Apr 2018 18:30:29 GMT):
@pschwarz, somewhat unrelated but I know diskIO is a bit of an issue for us and saw this getting added to GETH

kelly_ (Tue, 10 Apr 2018 18:30:30 GMT):
https://github.com/ethereum/go-ethereum/pull/15857

kelly_ (Tue, 10 Apr 2018 18:41:03 GMT):
this is one other recent piece of 'prior art' that discusses state pruning for geth - https://github.com/ethereumproject/go-ethereum/issues/440

kelly_ (Tue, 10 Apr 2018 18:46:34 GMT):
and... last thing re: prior art

kelly_ (Tue, 10 Apr 2018 18:46:38 GMT):
"Parity offers continuous state trie pruning. The default --pruning fast will keep only the latest 64 states by default. It's expected to grow at a rate of a few GB per year"

kelly_ (Tue, 10 Apr 2018 18:47:19 GMT):
there is information on parity's 4 pruning modes here - https://ethereum.stackexchange.com/questions/3332/what-is-the-parity-light-pruning-mode?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

kelly_ (Tue, 10 Apr 2018 18:53:14 GMT):
` --pruning=[METHOD] Configure pruning of the state/storage trie. METHOD may be one of auto, archive, fast: archive - keep all state trie data. No pruning. fast - maintain journal overlay. Fast but 50MB used. auto - use the method most recently synced or default to fast if none synced. (default: auto) --pruning-history=[NUM] Set a minimum number of recent states to keep in memory when pruning is active. (default: 64) --pruning-memory=[MB] The ideal amount of memory in megabytes to use to store recent states. As many states as possible will be kept within this limit, and at least --pruning-history states will always be kept. (default: 32)`

kelly_ (Tue, 10 Apr 2018 19:06:54 GMT):
@adamludvik - while also digging around, this PR for GETH has the interface for when a consensus engine needs to send p2p messages - https://github.com/ethereum/go-ethereum/pull/16385/files

adamludvik (Tue, 10 Apr 2018 20:31:08 GMT):
looks very similar to ours

pschwarz (Tue, 10 Apr 2018 20:50:15 GMT):
Thanks @kelly_ will look at those

pschwarz (Tue, 10 Apr 2018 21:28:33 GMT):
@amundson @Dan Backport PR: https://github.com/hyperledger/sawtooth-core/pull/1564

TheOnlyJoey (Thu, 12 Apr 2018 13:29:49 GMT):
Has joined the channel.

TheOnlyJoey (Thu, 12 Apr 2018 13:30:03 GMT):
Has left the channel.

amundson (Thu, 12 Apr 2018 15:17:11 GMT):
is everyone cool with doing this in all our docker files? - https://github.com/hyperledger/sawtooth-supply-chain/pull/45/files

amundson (Thu, 12 Apr 2018 15:17:36 GMT):
@rberg2 @rbuysse ^

amundson (Thu, 12 Apr 2018 15:19:20 GMT):
related to https://jira.hyperledger.org/browse/STL-1078

amundson (Thu, 12 Apr 2018 15:21:06 GMT):
likely this would not actually fix STL-1078 (which was a DNS error?) but maybe a good thing anyway? or is the complexity not worth it?

rbuysse (Thu, 12 Apr 2018 15:24:40 GMT):
I don't like the conplexity

rbuysse (Thu, 12 Apr 2018 15:25:28 GMT):
if it's a problem people are seeing often we should just switch to the HA pool

rberg2 (Thu, 12 Apr 2018 15:25:29 GMT):
that seems like a fine idea to me, I have seen keyserver.ubuntu.com hickup a few times over the years, or we could change the first apt-key to use the pool

askmish (Thu, 12 Apr 2018 15:31:00 GMT):
Its a simple code, just an or operation. This is a common, standard fix for handling reliability issues: when a keyserver is down or unresponsive or something like that.

adamludvik (Fri, 13 Apr 2018 18:01:36 GMT):
@Dan here is the Rust/Python logging facade I was talking about: https://github.com/hyperledger/sawtooth-core/commit/7ea8654ffc30c92f6250234e2f8e9dc892b61010

santiagop (Fri, 13 Apr 2018 20:07:37 GMT):
Has joined the channel.

adamludvik (Mon, 16 Apr 2018 15:52:57 GMT):
Merged: https://github.com/hyperledger/sawtooth-rfcs/pull/6

jaxdave (Mon, 16 Apr 2018 15:54:46 GMT):
Has joined the channel.

yosra (Tue, 17 Apr 2018 12:30:06 GMT):
Has joined the channel.

Dan (Wed, 18 Apr 2018 19:33:43 GMT):
@jsmitchell see if you can post

jsmitchell (Wed, 18 Apr 2018 21:31:28 GMT):
Nope

Dan (Thu, 19 Apr 2018 00:27:22 GMT):
Ok, well let me know if that changes.

rkrish82 (Thu, 19 Apr 2018 02:59:33 GMT):
Has joined the channel.

jsmitchell (Thu, 19 Apr 2018 03:20:12 GMT):
Yep

jsmitchell (Thu, 19 Apr 2018 03:20:19 GMT):
It actually did work

jsmitchell (Thu, 19 Apr 2018 03:20:33 GMT):
I just thought it was funny to post “Nope”

Dan (Thu, 19 Apr 2018 12:42:34 GMT):
I'm going to need a longer explanation to understand whether you can post in this channel.

rjones (Thu, 19 Apr 2018 13:50:39 GMT):
@Dan yes, I can post in this channel?

askmish (Fri, 20 Apr 2018 10:54:32 GMT):
Testing if I can post here

askmish (Fri, 20 Apr 2018 10:54:45 GMT):
can anyone see this?

amolk (Fri, 20 Apr 2018 12:08:41 GMT):
Nope

Dan (Fri, 20 Apr 2018 14:58:04 GMT):
Me neither

TomBarnes (Fri, 20 Apr 2018 16:11:48 GMT):
pankaj

TomBarnes (Fri, 20 Apr 2018 20:10:30 GMT):
yep

TomBarnes (Fri, 20 Apr 2018 20:10:50 GMT):
i mean nope

TomBarnes (Fri, 20 Apr 2018 20:11:00 GMT):
:)

jeffreychengmw (Mon, 23 Apr 2018 13:04:25 GMT):
Has joined the channel.

amundson (Mon, 23 Apr 2018 19:11:33 GMT):
I've submitted a RFC PR for Sawtooth Sabre (on-chain smart contracts executed in WebAssembly) - https://github.com/hyperledger/sawtooth-rfcs/pull/7

amundson (Mon, 23 Apr 2018 19:14:51 GMT):
@TomBarnes @Dan @jsmitchell @kelly_ @pschwarz @agunde @adamludvik - as root team members, please review this RFC ^

kelly_ (Mon, 23 Apr 2018 19:15:08 GMT):
:woo:

amundson (Mon, 23 Apr 2018 19:18:21 GMT):
A working implementation of Sabre exists which is capable of running Sawtooth Supply Chain with relatively minor modifications; that will be submitted soon (hopefully we can vote on the RFC prior to that)

jsmitchell (Mon, 23 Apr 2018 19:18:41 GMT):
reviewed

amundson (Mon, 23 Apr 2018 19:19:19 GMT):
Note that @jsmitchell had access to this for a long time, but held out feedback to get the first red x. :)

jsmitchell (Mon, 23 Apr 2018 19:19:50 GMT):
you think i just save up your typos for later use?

amundson (Mon, 23 Apr 2018 19:19:51 GMT):
s/reviewed/First!/

kelly_ (Mon, 23 Apr 2018 19:37:06 GMT):
@amundson quick question - and maybe better to put in the RFC comments, but it seems like the contract deployer gets to control the namespace that the contract writes to

kelly_ (Mon, 23 Apr 2018 19:37:21 GMT):
doesn't this make the contract less 'trustless' vs the code being the only one managing that namespace.

kelly_ (Mon, 23 Apr 2018 19:38:04 GMT):
what if i had supply chain writing into a namespace. then i deploy another contract and permit it to write to the same namespace, and i change the ownership of all supply chain assets to my pubkey

kelly_ (Mon, 23 Apr 2018 19:38:24 GMT):
is there something to prevent that?

jsmitchell (Mon, 23 Apr 2018 19:38:27 GMT):
that can happen now with regular TPs

jsmitchell (Mon, 23 Apr 2018 19:38:39 GMT):
and that is what the pokitdok contributed namespace thingy prevents

kelly_ (Mon, 23 Apr 2018 19:39:09 GMT):
right, so i'm wondering if there is a similar thing to isolate sabre contracts

jsmitchell (Mon, 23 Apr 2018 19:39:18 GMT):
so, there would need to be a mechanism to subpermit sabre contracts

kelly_ (Mon, 23 Apr 2018 19:39:30 GMT):
ok got it, just wanted to make sure i was understanding it correctly

jsmitchell (Mon, 23 Apr 2018 19:39:54 GMT):
@agunde ^ any thoughts? you've been closer to the conversations than i have

kelly_ (Mon, 23 Apr 2018 19:42:07 GMT):
in ethereum the default is that only that contract can write to that namespace, and if you want to update the smart contract than you design in a proxy contract to sit in between, which inherently instills some new trust model for if you trust the owner of the proxy contract

kelly_ (Mon, 23 Apr 2018 19:42:33 GMT):
not saying one is better than the other, just comparing/contrasting the defaults

agunde (Mon, 23 Apr 2018 19:52:16 GMT):
There is a concept of "owners" of a namespace. The second contract would need to be granted permission to read and write fromm the supply-chian namespace by those owners before they would be able to overwrite the supply-chain state.

kelly_ (Mon, 23 Apr 2018 20:09:34 GMT):
@agunde if you deployed a contract with a random owner (say pubkey=00000000000000000000000000000000), would that effectively lock the deployment so that from there on out, only that one contract could write to that namespace

kelly_ (Mon, 23 Apr 2018 20:13:30 GMT):
maybe thats a simple way to prevent additional contracts from writing to that namespace, or to prevent the owner from deleting/updating the contract once deployed

agunde (Mon, 23 Apr 2018 20:15:51 GMT):
There are administrators whose public keys are stored in the setting sawtooth.swa.administrators that could still override the permissions on that namespace.

kelly_ (Mon, 23 Apr 2018 20:19:33 GMT):
so there would still be some trust in the sawtooth administrators, but could you do the above to remove 'trust' in the person that deployed the smart contract?

agunde (Mon, 23 Apr 2018 20:26:08 GMT):
To be clear there are two sets of owners. Owners of a contract and owners of a namespace. Theses are not necessarily the same set. If the owner was set to an invalid key for a contract, no new versions would be able to be uploaded. The down side is the contract would also not be able to be deleted.

kelly_ (Mon, 23 Apr 2018 20:26:26 GMT):
got it, and similarly for a namespace too right?

agunde (Mon, 23 Apr 2018 20:26:38 GMT):
except for the sawtooth.swa.administrators, yes

kelly_ (Mon, 23 Apr 2018 20:26:48 GMT):
sweet, that actually sounds pretty useful

kelly_ (Mon, 23 Apr 2018 20:27:11 GMT):
I didn't know if there was something hard-coded where the deployer always had access

kelly_ (Mon, 23 Apr 2018 20:28:56 GMT):
one other random question. if you didn't want anyone as an admin of sawtooth network, is it possible to set all of the settings in the genesis block and then have a null value/non-functional value for swa.administrators

kelly_ (Mon, 23 Apr 2018 20:29:18 GMT):
or is a key required in swa.administrators to get the right TPs loaded after the genesis block

agunde (Mon, 23 Apr 2018 20:29:42 GMT):
I was thinking that by default the contract deployer was set as an owner. But you brought up good points on why you might not want to do that.

agunde (Mon, 23 Apr 2018 20:30:52 GMT):
Only swa.administrators are allowed to create the initial namespace registry.

kelly_ (Mon, 23 Apr 2018 20:31:39 GMT):
ok so you couldn't launch a functioning network without swa.administrators

agunde (Mon, 23 Apr 2018 20:32:16 GMT):
Right. Once the namespace are created though you could null it out. You would just need to make sure that the namespace registries and owners are set up correctly.

kelly_ (Mon, 23 Apr 2018 20:32:26 GMT):
ok cool that makes a ton of sense

kelly_ (Mon, 23 Apr 2018 20:32:43 GMT):
I could see a variety of different deployment models where you may or may not want admins/namespace owners/contract owners

kelly_ (Mon, 23 Apr 2018 20:32:47 GMT):
so nice to have that flexibility

amundson (Tue, 24 Apr 2018 01:31:39 GMT):
So, as an example, if I'm the owner of the intkey namespace, I can allow specific contracts permissions to that namespace. The owner of the namespace thus must trust the owners of those contracts, who can upload new versions of the contracts. It is a form of delegation.

amundson (Tue, 24 Apr 2018 01:37:58 GMT):
It is intended to permissioned but simple. We can add more complexity later. A system like the one used in Ethereum where smart contracts can only access their own namespace would be trivial; but, it provides only a subset of the functionality.

deb (Tue, 24 Apr 2018 06:57:35 GMT):
Has joined the channel.

Subhadip 1 (Tue, 24 Apr 2018 06:57:46 GMT):
Has joined the channel.

aviralwal (Tue, 24 Apr 2018 10:40:56 GMT):
Has joined the channel.

Dan (Tue, 24 Apr 2018 15:35:23 GMT):
Just realized I don't think @achenette 's app dev guide changes including setting up multi-node networks made it into a release? Probably missed the window for release v1.0.3, what do we think about getting those into v1.04?

achenette (Tue, 24 Apr 2018 15:45:33 GMT):
That would be nice, especially we could include this week's corrections for the multi-node procedure. I hope to have those corrections done soon (if Jenkins is willing and I can successfully coach sTyL3 through his first PR), so the repaired multi-node procedure would be available for v1.04.

Dan (Tue, 24 Apr 2018 15:55:09 GMT):
@TomBarnes @amundson regarding branch/tag on sawtooth-supply-chain, is this the right commit for the tip of that branch? https://github.com/hyperledger/sawtooth-supply-chain/commit/4701a5c0337b6d002349a9061d3dd7670e4c80e6. I Think that's the last commit before culling python? Or we discussed just branching with the first stable rust. Is that current head of master?

amundson (Tue, 24 Apr 2018 15:55:48 GMT):
@Dan I was literally just looking at that

Dan (Tue, 24 Apr 2018 15:56:01 GMT):
must be on same brain wave :)

Dan (Tue, 24 Apr 2018 15:56:15 GMT):
@achenette that would be great, can you pull together the requisite commits and submit them as backports to 1.0.

amundson (Tue, 24 Apr 2018 15:56:59 GMT):
@Dan that isn't the right one, but I'll find it

amundson (Tue, 24 Apr 2018 16:07:52 GMT):
First, I'm going to create a 0-8 branch in supply chain from the commit prior to Zac's 1.0 update (8b1551e54c838abfc399d295cdf717155f127356)

amundson (Tue, 24 Apr 2018 16:09:17 GMT):
Second, I think we should create a 0-9 branch starting with commit 40ae875e794c372e4342df46906b308aefb42059 which is prior to Rust changes going in; we should then backport non-Rust stuff from master to 0-9

amundson (Tue, 24 Apr 2018 16:09:28 GMT):
Third, master becomes 1.10.x

amundson (Tue, 24 Apr 2018 16:09:28 GMT):
Third, master becomes 0.10.x

Dan (Tue, 24 Apr 2018 18:40:40 GMT):
Backporting might not be important. I think if all stakeholders are comfortable with the current rust implementation, then the last python commit is good to have, but don't need to 'maintain' it. v0.10.x should satisfy @tom 's need (supposing that point is stable).

amundson (Tue, 24 Apr 2018 18:49:21 GMT):
well, we are adding features to master, so it depends on what you mean by stable

amundson (Tue, 24 Apr 2018 18:51:31 GMT):
right now, supply chain master requires Sawtooth's 1.1.x SDK (from master), as we have backported the Rust SDK to 1.0

amundson (Tue, 24 Apr 2018 18:52:31 GMT):
backporting the Rust SDK would probably make sense

Dan (Tue, 24 Apr 2018 19:20:25 GMT):
stable = not broke

Dan (Tue, 24 Apr 2018 19:36:15 GMT):
so i'm interpreting supply-chain 0.9 as python and 0.10 as rust. both supply-chain 0.9 and 0.10 work with sawtooth 1.0. assuming 0.10 is 'fully operational' (can destroy planets) then 0.9 is a historical footnote. I don't think anyone needs backports then to supply-chain 0.9 supply-chain master depends on sawtooth 1.1 features (i.e. sawtooth master) and would not be appropriate for e.g. AMI's. So backports / fixes to supply-chain 0.10 would be relevant.

Dan (Tue, 24 Apr 2018 19:39:13 GMT):
and if 0.9 == python then you may want 1 commit further ee22df267be45e271c1079af25393f07df102e0e which I think gets rid of a dead path for bandit. not a big deal if you've already branched.

amundson (Wed, 25 Apr 2018 00:50:17 GMT):
Commit ee22df267be45e271c1079af25393f07df102e0e is after some of the Rust commits. Using 'git log --graph' helps visualize how things are merged together. But, we can backport it.

techalchemist (Wed, 25 Apr 2018 10:51:03 GMT):
Has joined the channel.

Dan (Wed, 25 Apr 2018 13:13:28 GMT):
good tip!

Dan (Wed, 25 Apr 2018 13:28:24 GMT):
@askmish @amolk short notice, but I asked Adam if he could go over the consensus engine stuff at the sawtooth tech forum tomorrow. Would be great if you can make it. I think it is at 20:30 your time.

amundson (Wed, 25 Apr 2018 13:29:55 GMT):
0-8 and 0-9 branches now exist in sawtooth-supply-chain

amundson (Wed, 25 Apr 2018 13:30:35 GMT):
I'm testing the version change to master before I push it

amundson (Wed, 25 Apr 2018 13:37:41 GMT):
sawtooth-supply-chain master is now 0.10.x

adamludvik (Wed, 25 Apr 2018 17:35:54 GMT):

Consensus Engine SDK.pdf

adamludvik (Wed, 25 Apr 2018 17:36:23 GMT):
^ early previous of slides for tech forum tomorrow

pschwarz (Thu, 26 Apr 2018 16:34:16 GMT):
RFC for State Pruning: https://github.com/hyperledger/sawtooth-rfcs/pull/8

amolk (Thu, 26 Apr 2018 18:16:39 GMT):
Folks, is the RAFT code being re-written in rust?

Dan (Thu, 26 Apr 2018 18:58:10 GMT):
The intent is to use existing rust code if possible then adapt or reimplement if necessary.

adamludvik (Thu, 26 Apr 2018 19:02:34 GMT):
IIRC, this is the most viable candidate, which is based on etcd's Go implementation of Raft: https://github.com/pingcap/raft-rs

Dan (Thu, 26 Apr 2018 19:06:27 GMT):
yep

amolk (Fri, 27 Apr 2018 01:26:16 GMT):
Sorry guys, I asked the wrong question.. Blame it all on the confusion between 'REST', 'Rust' and 'Raft'! And being half asleep ;)

amolk (Fri, 27 Apr 2018 01:27:36 GMT):
So here is the correct question: is the REST interface being rewritten in Rust?

amolk (Fri, 27 Apr 2018 03:31:59 GMT):
Or, to put it in another way, is Rest going to Rust? :)

amundson (Fri, 27 Apr 2018 14:14:02 GMT):
@amolk I'm in favor of porting the REST API to Rust at some point, but as far as I know, we are not doing it currently. For application REST APIs, some of us have started using this very nice framework - https://github.com/SergioBenitez/Rocket

amolk (Fri, 27 Apr 2018 14:15:16 GMT):
The reason I asked is because we've been working on increasing the unit test coverage of the REST API. If the code is being re-written, we won't bother raising PRs for it.

amundson (Fri, 27 Apr 2018 14:19:11 GMT):
@amolk have you considered writing them in a manner that is not tightly coupled to the current implementation? For example, as a integration or component test were the test runs in a separate process and makes HTTP calls to it?

amundson (Fri, 27 Apr 2018 14:19:11 GMT):
@amolk have you considered writing them in a manner that is not tightly coupled to the current implementation? For example, as a integration or component test where the test runs in a separate process and makes HTTP calls to it?

amolk (Fri, 27 Apr 2018 14:21:16 GMT):
We're working on it on two fronts. The first is to just extend the existing unit tests to improve code coverage. The second is a more formal end-to-end test from a client perspective, first of the individual interfaces and then getting into some of the more involved usage scenarios.

amundson (Fri, 27 Apr 2018 14:21:32 GMT):
Even with unit tests, if they are well-documented (in API doc comments, for example), then we should be able to translate those to Rust at some point. So I think they still have value either way.

amolk (Fri, 27 Apr 2018 14:21:45 GMT):
ok

zac (Fri, 27 Apr 2018 14:22:57 GMT):
on the integration test front, I would recommend tightly coupling them to the Swagger spec if possible

zac (Fri, 27 Apr 2018 14:23:37 GMT):
We did that with a tool called Dredd in marketplace and it worked reasonably well

amundson (Fri, 27 Apr 2018 14:23:55 GMT):
I think I agree with that. The intent is that we are implementing that spec, so if the spec doesn't match the REST API behavior it is a bug in one of them.

zac (Fri, 27 Apr 2018 14:24:09 GMT):
yes, exactly

zac (Fri, 27 Apr 2018 14:25:36 GMT):
For unit tests, I wouldn't presume that the existing functions are necessarily the best structure for the REST API. If I were rewriting it (in Rust or otherwise), I would probably use it as an opportunity to do some refactoring.

zac (Fri, 27 Apr 2018 14:26:06 GMT):
Though obviously, if maintaining the existing method footprint (more or less) is important, that is doable.

amundson (Mon, 30 Apr 2018 14:47:45 GMT):
```Proposed RocketChat channels: #sawtooth This channel is used for general sawtooth discussion, including user questions. #sawtooth-announce This channel is used to make announcements related to Sawtooth. This includes items such as releases, acceptance of RFCs, posting on news stories related to Sawtooth, etc. This is intended to be a low-volume no-discussion channel, and is thus posting be restricted; if it is, access should be managed by the Sawtooth community outreach subteam. #sawtooth-core-dev This channel is used for Sawtooth core development discussion, including discussion of the validator, CLI, REST API, and other core components. This is the primary forum for the future Sawtooth core subteam. #sawtooth-consensus-dev This channel is used for Sawtooth consensus development discussion, including discussion of consensus-related APIs and various consensus engine implementations. This is the primary forum for the future Sawtooth consensus subteam. This replaces #sawtooth-consensus. #sawtooth-infra This channel is used for Sawtooth infrastructure discussion, and is the primary forum for the future Sawtooth infrastructure subteam discussion and related RFCs. This replaces #sawtooth-ci. #sawtooth-governance This channel is used for discussion of Sawtooth governance topics. This is the primary forum for discussion by the Sawtooth root team. Discussion on this channel may be restricted to root team members if necessary to keep the discussion soley on governance topics which require root team member participation, such as discussion and voting on RFCs. #sawtooth-outreach This channel is used for discussion of Sawtooth community outreach initiatives, including documentation, website, demos, training, etc. This is the primary forum for the future Sawtooth community outreach subteam. This replaces #sawtooth-edx. #sawtooth-release This channel is used for release management discussions. This includes release topics such as dependency management, license compliance, branching strategies, etc. This channel will also contain status discussion during the execution of releases. This is the primary channel of the future Sawtooth release management subteam. Note that discussion on topics which require root team approval must occur in the #sawtooth-governance channel. #sawtooth-sabre This channel is for discussion of Sawtooth Sabre. This includes both applicaiton development and Sabre development. This is the primary forum for the future Sabre subteam.

amundson (Mon, 30 Apr 2018 14:47:45 GMT):
```Proposed RocketChat channels: #sawtooth This channel is used for general sawtooth discussion, including user questions. #sawtooth-announce This channel is used to make announcements related to Sawtooth. This includes items such as releases, acceptance of RFCs, posting on news stories related to Sawtooth, etc. This is intended to be a low-volume no-discussion channel, and is thus posting be restricted; if it is, access should be managed by the Sawtooth community outreach subteam. #sawtooth-core-dev This channel is used for Sawtooth core development discussion, including discussion of the validator, CLI, REST API, and other core components. This is the primary forum for the future Sawtooth core subteam. #sawtooth-consensus-dev This channel is used for Sawtooth consensus development discussion, including discussion of consensus-related APIs and various consensus engine implementations. This is the primary forum for the future Sawtooth consensus subteam. This replaces #sawtooth-consensus. #sawtooth-infra This channel is used for Sawtooth infrastructure discussion, and is the primary forum for the future Sawtooth infrastructure subteam discussion and related RFCs. This replaces #sawtooth-ci. #sawtooth-governance This channel is used for discussion of Sawtooth governance topics. This is the primary forum for discussion by the Sawtooth root team. Discussion on this channel may be restricted to root team members if necessary to keep the discussion soley on governance topics which require root team member participation, such as discussion and voting on RFCs. #sawtooth-outreach This channel is used for discussion of Sawtooth community outreach initiatives, including documentation, website, demos, training, etc. This is the primary forum for the future Sawtooth community outreach subteam. This replaces #sawtooth-edx. #sawtooth-release This channel is used for release management discussions. This includes release topics such as dependency management, license compliance, branching strategies, etc. This channel will also contain status discussion during the execution of releases. This is the primary channel of the future Sawtooth release management subteam. Note that discussion on topics which require root team approval must occur in the #sawtooth-governance channel. #sawtooth-sabre This channel is for discussion of Sawtooth Sabre. This includes both applicaiton development and Sabre development. This is the primary forum for the future Sabre subteam. #sawtooth-seth This channel is for discussion of Sawtooth Seth. This includes both applicaiton development and Seth development. This is the primary forum for the future Seth subteam. #sawtooth-sdk-dev This channel is used for Sawtooth SDK development discussion. This includes all Sawtooth SDKs. This is the primary forum for the future Sawtooth application SDKs subteam. #sawtooth-supply-chain This channel is for discussion of Sawtooth Supply Chain. This includes both use of Sawtooth Supply Chain as well as development of Sawtooth Supply Chain.```

amundson (Mon, 30 Apr 2018 14:49:21 GMT):
I'm trying to put together a plan for RocketChat channels - see above, and let's discuss.

Dan (Mon, 30 Apr 2018 14:56:23 GMT):
Looks good to me.

Dan (Mon, 30 Apr 2018 15:11:43 GMT):
@RyanBanks could you expand here a tad on the text you added to the Sabre RFC?

RobinBanks (Mon, 30 Apr 2018 15:23:40 GMT):
Has joined the channel.

RobinBanks (Mon, 30 Apr 2018 15:27:03 GMT):
Sure

RobinBanks (Mon, 30 Apr 2018 15:27:49 GMT):
Sabre is build on Wasmi

RobinBanks (Mon, 30 Apr 2018 15:27:50 GMT):
https://github.com/paritytech/wasmi

RobinBanks (Mon, 30 Apr 2018 15:29:47 GMT):
That library is responsible for interpreting the uploaded contracts. It doesn't have a mechanism build into it that can stop erroneous or malicious code.

RobinBanks (Mon, 30 Apr 2018 15:30:42 GMT):
So if a contract has a while true loop, it'll run forever.

RobinBanks (Mon, 30 Apr 2018 15:34:41 GMT):
However, we could submit a PR against wasmi to add an execution limit. Basically we'd need to replace the loop in `do_run_function` with a `for` and specify a maximum number of instructions that can be run.

RobinBanks (Mon, 30 Apr 2018 15:34:43 GMT):
https://github.com/paritytech/wasmi/blob/d926993c6c796ba09ffb70bf53c6b921b3c9acef/src/runner.rs#L130

RobinBanks (Mon, 30 Apr 2018 15:36:32 GMT):
Do you have any other questions @Dan?

tkuhrt (Mon, 30 Apr 2018 18:35:57 GMT):
@amundson : Regarding chat channels...can you help me understand subteams and how someone who is wanting to contribute to one of the Sawtooth components (core, seth, sabre, etc.) gets involved?

amundson (Mon, 30 Apr 2018 18:37:23 GMT):
@tkuhrt The two related documents are: https://github.com/hyperledger/sawtooth-rfcs/ https://github.com/hyperledger/sawtooth-rfcs/blob/master/text/0000-sawtooth-governance.md

Matthieu.inBlocks (Tue, 01 May 2018 13:08:32 GMT):
Has joined the channel.

Dan (Tue, 01 May 2018 16:28:02 GMT):
During sprint planning, Peter mentioned the state pruning RFC reminding me I'm behind in reviewing that. Assuming others are too... https://github.com/hyperledger/sawtooth-rfcs/pull/8

jeremie.inblocks (Wed, 02 May 2018 15:30:04 GMT):
Has joined the channel.

pschwarz (Wed, 02 May 2018 18:39:54 GMT):
Thanks for the mention @Dan

amundson (Wed, 02 May 2018 21:00:13 GMT):
@agunde @TomBarnes @adamludvik @Dan @jsmitchell @kelly_ @pschwarz I've proposed the Sabre RFC be merged, please vote. The PR is https://github.com/hyperledger/sawtooth-rfcs/pull/7

jsmitchell (Wed, 02 May 2018 21:09:16 GMT):
voted

pschwarz (Wed, 02 May 2018 21:09:44 GMT):
check

amundson (Thu, 03 May 2018 02:49:59 GMT):
@MicBowman @Dan and others - Nice work on the Smart Contracts paper!

tungdt_socoboy (Sat, 05 May 2018 11:43:00 GMT):
Has joined the channel.

zac (Sun, 06 May 2018 20:26:03 GMT):
Added a new JIRA task to add a `Context.new_private_key_from_hex` method to the various SDKs and signing modules. As I see it, this is the missing piece to make the `CryptoFactory` workflow usable. Currently contexts only have a `new_random_private_key` method, which means there is no way to use them with an existing private key, you have to import and instantiate `secp256k1.Secp256k1PrivateKey` directly. This defeats the purpose of having a `CryptoFactory`. https://jira.hyperledger.org/browse/STL-1231

amundson (Mon, 07 May 2018 04:33:00 GMT):
@zac added some comments. does your use case involve loading many private keys?

zac (Mon, 07 May 2018 05:10:04 GMT):
Not many at a time. Just a basic web app login.

zac (Mon, 07 May 2018 05:25:05 GMT):
However, creating a PrivateKey instance from a previously generated and stored private key is fundamental. I currently can't do that without importing the `Secp256k1PrivateKey` class directly, undermining the purpose of the `CryptoFactory`. If the factory is the best practice, I would like to use it for the educational code I am putting in front of students.

zac (Mon, 07 May 2018 05:25:17 GMT):
I added a JIRA comment.

dsl (Mon, 07 May 2018 05:53:52 GMT):
Has joined the channel.

zac (Mon, 07 May 2018 18:41:45 GMT):
I'm writing an RFC to create a Sawtooth Supply Chain subteam, and thinking about the member list. Sensible Bitwise inclusions would be myself, @amundson, @agunde, and possibly @jsmitchell. I think it would be good to have a least one Intel representative as well. @kelly_ ? Maybe @Dan if Kelly didn't want to?

zac (Mon, 07 May 2018 18:42:03 GMT):
How do the people pinged feel about this?

rjones (Mon, 07 May 2018 22:54:49 GMT):
Are there any PRs stuck waiting on DCO bot, and DCO bot is not firing? if so, could you please add links here? https://github.com/probot/dco/issues/69 thank you

amundson (Tue, 08 May 2018 01:32:18 GMT):
There is now a #sawtooth-sabre channel

Dan (Tue, 08 May 2018 12:50:54 GMT):
@rjones thanks. I think we closed the one that was stuck. I haven't seen others yet.

danintel (Wed, 09 May 2018 21:33:44 GMT):
Has joined the channel.

lucienlu (Thu, 10 May 2018 11:57:50 GMT):
Has joined the channel.

Dan (Thu, 10 May 2018 14:38:16 GMT):
@TomBarnes @amundson see #TSC for some discussion on the copyright header

Dan (Thu, 10 May 2018 14:40:46 GMT):
One comment from LF legal is that the copyright notice is not required. Putting a notice at the top of a file does not assign copyright. A legal agreement must be executed to assign copyright. The notice at the top of a file is meant to help track attributions as files are copied outside of the originating project. I've asked for clarity regarding a copyright notice at the top of the source file vs some conventional way to reference the notices file.

Dan (Thu, 10 May 2018 14:41:04 GMT):
Conclusion is that this issue will go to the HL legal committee for some further discussion and direction.

amundson (Thu, 10 May 2018 16:45:37 GMT):
@Dan The current conventions seem appropriate, so I'm not sure why you are advocating for a change.

Dan (Thu, 10 May 2018 16:47:09 GMT):
Because having multiple copyright notices at the top of each file are unmanageable. And the frequency of intel copyright notices has caused at least one contributor to think they needed to include that in their contribution - I think it's generally counter to growing community.

amundson (Thu, 10 May 2018 18:07:59 GMT):
if you are interested in Sawtooth Sabre, please join #sawtooth-sabre

amundson (Thu, 10 May 2018 22:47:56 GMT):
Sabre Announcement: https://lists.hyperledger.org/g/sawtooth/message/280

tungdt_socoboy (Sun, 13 May 2018 15:46:16 GMT):
Hi everyone, I'm newbie in Sawtooth development, I have one question, hope someone can help. I see in Github, Sawtooth-core repository was registered as a Python repository, is that true? I see on its source code, almost all components was built by Python like rest_api, transaction families, but the validator was built by Rust, is that correct? Do you know why? Is that Python have problems? and not sufficient with Sawtooth Validator?

rjones (Sun, 13 May 2018 16:12:37 GMT):
@tungdt_socoboy GitHub automagically guesses the code type.

amundson (Sun, 13 May 2018 16:17:17 GMT):
@Dan I'm not familiar with the HL legal committee. Who has a voice on that committee? Is this a committee formed by the TSC? Is it distinct from LF legal?

grkvlt (Sun, 13 May 2018 18:00:45 GMT):
@Dan I agree with the policy of not including copyright notices - I have also ended up creating new files that have apparent Intel copyright occasionally.

amundson (Sun, 13 May 2018 23:39:45 GMT):
Be careful what you wish for - the logical progression here is everyone making copyright declarations in every single commit message similar to the Signed-off-by statement.

amundson (Sun, 13 May 2018 23:49:57 GMT):
If the HL legal committee is tasked with finding suitable language for the top of the file, that's really insufficient, because it doesn't fully answer the question of how we might reconstruct an accurate copyright history.

amundson (Sun, 13 May 2018 23:54:37 GMT):
Currently, the Copyright at the top of the file is a more accurate method of determining copyright than the commit history, even if in some cases it is not completely accurate (those instances are probably easily identified by committer not acting on behalf of the copyright holder stated at the top of the file). If we flip it around, where we are expecting commit history to do the work for us, that requires more thought and likely more work at the commit level from everyone.

amundson (Mon, 14 May 2018 00:31:14 GMT):
@grkvlt re: "I think we should add support for SAWTOOTH_HOME and SAWTOOTH_KEYS environment variables to most Sawtooth utilities and services." -- we used to have similar support for this in 0-7 with the CURRENCY_* variables and dropped it when we transitioned to 0.8. I'd happily co-author and RFC with you on the larger scope of "path resolution process" if you are interested (and no one is dissenting on the general idea).

grkvlt (Mon, 14 May 2018 00:34:46 GMT):
@amundson really, copyright as *legal* matter is, i believe, a non-issue, since everything is licensed as APACHE-2 and all contributors, by submitting commits, license their code as APACHE-2, so the problem is more that the copyright message is *wrong*

grkvlt (Mon, 14 May 2018 00:35:00 GMT):
but, as always, check with a lawyer ;)

amundson (Mon, 14 May 2018 00:36:00 GMT):
It is an issue, because the owner of the copyright is the only legal entity which can bring a lawsuit against an infringing party

amundson (Mon, 14 May 2018 00:36:34 GMT):
If we can not suitably determine copyright, it would probably hurt such a case in the future if one where necessary

grkvlt (Mon, 14 May 2018 00:36:39 GMT):
what kind of lawsuit?

grkvlt (Mon, 14 May 2018 00:37:26 GMT):
for the purposes of determining copyright you'll have to go on git history to get authorship, so those headers are not useful anyway

amundson (Mon, 14 May 2018 00:38:01 GMT):
authorship is different than who owns copyright, and for this project, that detail matters

grkvlt (Mon, 14 May 2018 00:39:43 GMT):
right, but i'm unclear how a 'Copyright 2017 Intel' message on a file will clarify that portions of that file are copyright 2018 Cloudsoft Corporation because I wrote them. it still seems like the notices aren't useful.

grkvlt (Mon, 14 May 2018 00:40:26 GMT):
i don't recall signing anything that assigned copyright to Intel, anyway

amundson (Mon, 14 May 2018 00:40:36 GMT):
well, current state, for those that have contributed for different copyright holders (Intel, Cargill, Bitwise IO, etc.) those current copyright headers are accurate

amundson (Mon, 14 May 2018 00:40:41 GMT):
(or accurate enough)

grkvlt (Mon, 14 May 2018 00:42:47 GMT):
but they'll diverge. and in the future will be more and more wrong. so, the solution used by most OSS projects is to not have copyright on files, just the license, and have (if deemed necessary) a NOTICE file with copyright statements, which is (I think) what @Dan and myself were suggesting

grkvlt (Mon, 14 May 2018 00:44:12 GMT):
it might be good to make `CONTRIBUTING.md` more explicit about copyright issues, too?

amundson (Mon, 14 May 2018 00:45:36 GMT):
Those projects are relying on being able to reverse engineer copyright from git history, which is why I bring up the issue with that for our project.

grkvlt (Mon, 14 May 2018 00:47:41 GMT):
the sawtooth contributing guidlines say that too - it states that the commit for a file (with sign-off) is the developer indicating compliance with the DCO (developer certificate of origin) which is the legal source of copyright, so `git blame` and authorship *are* the source of truth for sawtooth copyrights. > The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file http://developercertificate.org/

grkvlt (Mon, 14 May 2018 00:48:04 GMT):
https://sawtooth.hyperledger.org/docs/core/releases/latest/community/contributing.html > Each commit must include a “Signed-off-by” line in the commit message (git commit -s). This sign-off indicates that you agree the commit satisfies the Developer Certificate of Origin (DCO).

amundson (Mon, 14 May 2018 00:54:48 GMT):
Again, you are confusing authorship with the copyright holder. You can determine the individual that authored it but not necessarily the copyright holder. In some cases, but not all, you can derive one from the other.

amundson (Mon, 14 May 2018 00:57:23 GMT):
I'm not suggesting we can't make a change, but it should at least be well thought out, not on a whim without considering these issues.

grkvlt (Mon, 14 May 2018 01:06:00 GMT):
hm, think i see what you mean, but for those cases, surely that's pretty much what the NOTICES file is for? but, where there is a legal issue, as i understand it authorship is often exactly what they want to know, and lawyers then as the author various things about their terms of employment, contracts and so on to determine the actual copyright holder. basically legal issues occur because the file says copyright B, and was written by A, but it turns out because A was a sub-contractor of C and did the work during office hours, the file is acually copyright C. so the *important* thing is authorship, as that lets us derive copyright in the instances where it has become a problem. the NOTICES file can give a good-faith set of copyright holders, but that's probably the best we can do.

grkvlt (Mon, 14 May 2018 01:06:00 GMT):
hm, think i see what you mean, but for those cases, surely that's pretty much what the NOTICES file is for? but, where there is a legal issue, as i understand it authorship is often exactly what they want to know, and lawyers then ask the author various things about their terms of employment, contracts and so on to determine the actual copyright holder. basically legal issues occur because the file says copyright B, and was written by A, but it turns out because A was a sub-contractor of C and did the work during office hours, the file is acually copyright C. so the *important* thing is authorship, as that lets us derive copyright in the instances where it has become a problem. the NOTICES file can give a good-faith set of copyright holders, but that's probably the best we can do.

grkvlt (Mon, 14 May 2018 01:07:16 GMT):
however, as you point out, let's not rush into anything!

amundson (Mon, 14 May 2018 01:07:41 GMT):
FWIW - not my first time dealing with this; in 1999 this is the approach we took w/GTK+ and GIMP - https://gitlab.gnome.org/GNOME/gtk/commit/279e878bddb61086f813385dc94fd04a5465473a

amundson (Mon, 14 May 2018 01:08:18 GMT):
also, now I feel old :)

grkvlt (Mon, 14 May 2018 01:12:18 GMT):
right, have the AUTHORS file as the list of contributors...

grkvlt (Mon, 14 May 2018 01:13:00 GMT):
would be interesting to have a definitive statement from linux foundation legal about the best practices they recommend

grkvlt (Mon, 14 May 2018 01:13:22 GMT):
(oh, and _why_ they recommend them, of course!)

amundson (Mon, 14 May 2018 01:35:40 GMT):
They aren't one of the parties which hold copyright, and its much more important that all of the primary copyright holders and maintainers agree before we move forward. Maybe that's just a reasonable proposal we haven't seen yet.

paul.sitoh (Wed, 16 May 2018 16:21:00 GMT):
Has joined the channel.

Dan (Thu, 17 May 2018 12:57:45 GMT):
@pankajgoyal plz familiarize yourself with this: https://github.com/Cargill/sawtooth-rfcs/blob/c006-docker-compose-builds/text/0000-docker-compose-builds.md

svanschalkwyk (Fri, 18 May 2018 21:11:43 GMT):
Has joined the channel.

Sarah.Conway (Mon, 21 May 2018 19:48:52 GMT):
Has joined the channel.

Sarah.Conway (Mon, 21 May 2018 20:12:38 GMT):
hi all. I am new to working on marketing/PR for Hyperledger. We are writing a follow up Consensus blog for the HL site that focuses on interoperability. Can someone share a few sentences, maybe 2-3, on what concrete progress we have made or are we planning to make with #sawtooth on this front? Or feel free to point me to some urls, PPTs, etc. Thanks!

Johnjam (Tue, 22 May 2018 07:02:57 GMT):
Has joined the channel.

danintel (Tue, 22 May 2018 14:28:42 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=jAWsbeQsaRb7vJ7Zv) @Sarah Conway I would ask @Dan (not me--the other Dan). I understand a RAFT concensus engine is planned. RAFT uses an elected leader mechanism and is Crash Fault Tolerant, but not Byzantine Fault Tolerant. Existing Consensus engines are PoET simulator, PoET SGX, and (for development and testing only) Dev Mode.

rjones (Tue, 22 May 2018 14:29:32 GMT):
oh no the spaces in username bug strikes again

danintel (Tue, 22 May 2018 14:30:27 GMT):
Sawtooth uses an unpluggable consensus engine mechanism. New engines can be added and can be changed on-the-fly

amundson (Tue, 22 May 2018 16:06:54 GMT):
I suggest we rename poet implementations to PoET/BFT (currently PoET/SGX) and PoET/CFT (currently PoET/Simulator) to clarify the capabilities. Also 'simulator' is misleading/confusing at this level since it just means "without SGX".

Dan (Tue, 22 May 2018 16:24:47 GMT):
Agreed. Been wanting to do that for a while. (poet flavor renaming, that is)

Dan (Tue, 22 May 2018 16:31:30 GMT):
@Sarah Conway Earlier this year Sawtooth 1.0 released with Hyperledger's only Byzantine Fault Tolerant consensus and an industry first feature we called Dynamic Consensus. Dynamic Consensus goes beyond pluggable consensus to allow networks to change consensus on the fly. Sawtooth supports 3 consensus protocols right now and two more are in development. Also in development is a change to the sawtooth consensus API that will allow consensus providers written in a variety of languages. This follows a similar pattern to Sawtooth's support for smart contracts in a variety of languages. This expands the breadth of possible consensus algorithms/protocols that can be easily coupled to Sawtooth.

Sarah.Conway (Tue, 22 May 2018 17:09:34 GMT):
@Dan thank you very much. This is great. We'll drop it into our blog draft.

Dan (Tue, 22 May 2018 17:28:19 GMT):
@Sarah Conway if it works with the blog flow you might also reach out to academia in the article... While Sawtooth is developed to be a production platform, some of us also consider Sawtooth as a great research platform for consensus and other blockchain areas. Where researchers are creating new algorithms, they might find Sawtooth makes a handy platform so that they can compare and contrast with other protocols and don't have to write all the networking and other components incidental to their algorithms.

rkrish82 (Wed, 23 May 2018 10:13:30 GMT):
Hi All, Is there any support for Sawtooth on Ubuntu 18.LTS? in other words how to install sawtooth on Ubuntu 18?

askmish (Wed, 23 May 2018 12:19:28 GMT):
No we don't support ubuntu 18 as of now

MicBowman (Wed, 23 May 2018 16:20:06 GMT):
@amundson does sawtooth use x509 cert for identifying validators?

MicBowman (Wed, 23 May 2018 16:20:30 GMT):
are you binding ecdsa keys to any concept of an institutional identity?

Dan (Wed, 23 May 2018 17:30:03 GMT):
We're interested in moving to Ubuntu 18. Certainly open to patches! :)

Dan (Wed, 23 May 2018 18:02:27 GMT):
Please join the Hyperledger Sawtooth Technical Forum on Thursday, May 24th at 10am CDT for the following discussion: PoET 2.0 Preview (Ashish Mishra) Join from PC, Mac, Linux, iOS or Android: https://zoom.us/my/hyperledger.community

amundson (Thu, 24 May 2018 00:21:30 GMT):
@MicBowman identity of a validator is the public portion of the ecdsa key. could tie that to a x509 cert via a transaction family, but I don't know of an implementation of that.

amundson (Thu, 24 May 2018 00:27:11 GMT):
@MicBowman Cargill has some not-yet-released stuff around identity that does tie ecdsa public keys to organizations; it is something we might bring to Sabre since a component of it is wasm-related.

fedealconada (Thu, 24 May 2018 09:05:11 GMT):
Has joined the channel.

fedealconada (Thu, 24 May 2018 09:05:34 GMT):
hi all, is it possible to have the CreateContract permission that Hyperledger Burrow has here in Sawtooth?

Dan (Thu, 24 May 2018 13:47:23 GMT):
@fedealconada check over in #sawtooth-seth - there's probably more :eyes: over there for EVM/Burrow+sawtooth stuff.

fedealconada (Thu, 24 May 2018 13:47:54 GMT):
great, thanks @Dan !

MicBowman (Thu, 24 May 2018 14:13:19 GMT):
@dampuero thanks

dampuero (Thu, 24 May 2018 14:13:19 GMT):
Has joined the channel.

john_whitton (Thu, 24 May 2018 16:53:51 GMT):
Has joined the channel.

amundson (Fri, 25 May 2018 18:45:46 GMT):
I'm proposing the Supply Chain Subteam RFC enter FCP - https://github.com/hyperledger/sawtooth-rfcs/pull/11 - @Dan @TomBarnes @agunde @adamludvik @jsmitchell @kelly_ @pschwarz

amundson (Fri, 25 May 2018 18:46:49 GMT):
I pre-populated the checklist with those that already approved or said lgtm or +1, but I'll give some time to remove approval before declaring FCP if those comments/approvals were not intended to be FCP-related

jsmitchell (Fri, 25 May 2018 18:49:10 GMT):
are those checkboxes some kind of special github syntax? Can you only check the one associated with your name?

amundson (Fri, 25 May 2018 18:49:26 GMT):
edit, then add an x like the others

amundson (Fri, 25 May 2018 18:49:58 GMT):
and no, there is no special permission system

jsmitchell (Fri, 25 May 2018 18:50:19 GMT):
can you see a diff history with blame on that message?

amundson (Fri, 25 May 2018 18:50:35 GMT):
AFAIK, no

amundson (Fri, 25 May 2018 18:51:52 GMT):
we could make the process such that you have to approve the PR, and then we will check your name off on the list, if you feel better about that

donatopellegrino (Mon, 28 May 2018 13:26:27 GMT):
Has joined the channel.

Dan (Tue, 29 May 2018 14:19:00 GMT):
when I looked at that a while back seemed like the rust guys were operating with the checkboxes w/o PR's / traceability. I think you need write access on the repo to do that? Anyway I'm fine operating with less process until we have some issue that requires us to add more process. this is already pretty formal for the size of team we have.

nhrishi (Wed, 30 May 2018 01:25:38 GMT):
Has joined the channel.

tim-d-blue (Thu, 31 May 2018 15:13:38 GMT):
Has joined the channel.

zac (Thu, 31 May 2018 21:40:53 GMT):
Hey Sawtooth folks! RFCs for future Supply Chain development are going to start pouring in. Get ready to read and comment _a lot_ if you are interested in the platform.

zac (Thu, 31 May 2018 21:40:59 GMT):
First PR is here: https://github.com/hyperledger/sawtooth-rfcs/pull/13

zac (Thu, 31 May 2018 22:09:59 GMT):
And more: https://github.com/hyperledger/sawtooth-rfcs/pull/14 https://github.com/hyperledger/sawtooth-rfcs/pull/15

zac (Thu, 31 May 2018 22:13:12 GMT):
https://github.com/hyperledger/sawtooth-rfcs/pull/16

zac (Thu, 31 May 2018 22:18:12 GMT):
https://github.com/hyperledger/sawtooth-rfcs/pull/17

zac (Thu, 31 May 2018 22:18:19 GMT):
Hopefully that's enough

Dan (Mon, 04 Jun 2018 19:02:51 GMT):
do we have formal coding standards for rust? (like the python style rules we enforce with pylint?)

agunde (Mon, 04 Jun 2018 19:16:06 GMT):
https://github.com/rust-lang-nursery/rustfmt though it currently changes often.

Dan (Mon, 04 Jun 2018 21:42:54 GMT):
thanks. looks like that follows this style guide https://github.com/rust-lang-nursery/fmt-rfcs/blob/master/guide/guide.md.

danintel (Tue, 05 Jun 2018 14:30:56 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=5NJPCbecSzgTbzNAu) @Dan Speaking of pylint, I noticed the Python code is pylint3-dirty. Has pylint3 ever been ran on the sawtooth-core code? Or is it just a very infrequent thing?

Dan (Tue, 05 Jun 2018 14:48:16 GMT):
Hi @danintel. bin/run_lint is run continuously. That uses pycodestyle and pylint. I haven't looked at pylint3. Note that we have config file(s?) for pylint to squelch some of the warnings.

Dan (Tue, 05 Jun 2018 14:48:16 GMT):
Hi @danintel . bin/run_lint is run continuously. That uses pycodestyle and pylint. I haven't looked at pylint3. Note that we have config file(s?) for pylint to squelch some of the warnings.

danintel (Tue, 05 Jun 2018 16:04:11 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=G4vXWNjCswbK22LLA) Good to know. pylint has false positives that are suppressed with pylint3 when using Python 3-specific features, at least for me (such as with `print`).

danintel (Tue, 05 Jun 2018 22:41:15 GMT):
Is there a reason all the files in /var/lib/sawtooth/ are globally readable? (including the block chain and Merkle Trie)

Dan (Tue, 05 Jun 2018 23:52:18 GMT):
The data within the network goes to everyone in the network. Seems like it would be a narrow use case to make the data unreadable to some users on the validator host. Do you have something in mind, or just general best practices on minimal permissions?

Dan (Tue, 05 Jun 2018 23:52:18 GMT):
The data within the network goes to everyone in the network. Seems like it would be a narrow use case that would require the data to be unreadable to some users on the validator host. Do you have something in mind, or just general best practices on minimal permissions?

danintel (Wed, 06 Jun 2018 05:38:00 GMT):
No use case, but security-in-depth. If someone "breaks in" to a host as a regular user, it would be nice if they would not have access to all the data in a node (blockchain and state). This is distinct from everyone on the permissioned network having access to the data.

aaroncolaco (Wed, 06 Jun 2018 08:45:40 GMT):
Has joined the channel.

Dan (Wed, 06 Jun 2018 13:17:14 GMT):
That's reasonable. Personally I feel like most blockchain data is going to need to be stuff that isn't sensitive to exposure. Even if everything is encrypted and permissioned it's still shared with m individuals at n other companies. The business model around this is going to be in order to do business efficiently we all share such and such info.

amundson (Wed, 06 Jun 2018 17:42:34 GMT):
@danintel I believe the intent is /var/lib/sawtooth directory is created with 750 with user:group being sawtooth:sawtooth. For deb installs, this is done via validator/packaging/ubuntu/postinst. If you are seeing different behavior on a clean ubuntu install, it would be good to figure out why.

amundson (Wed, 06 Jun 2018 17:43:03 GMT):
in docker, none of that matters as everything is root

danintel (Wed, 06 Jun 2018 22:30:52 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=z8sr5HKCBrwNui52R) @amundson I have /var/log/sawtooth/ as 750, but /var/adm/sawtooth/ is 755. This is after installing on two Ubuntu 16.04.4 systems following the instructions for Sawtooth install on Ubuntu

danintel (Wed, 06 Jun 2018 22:58:30 GMT):
I ran the script manually, `sudo bash -x ./validator/packaging/ubuntu/postinst configure` And it fixed the `/var/lib/sawtooth/` permissions to 0750. The only error was a chmod on the non-existent `/etc/sawtooth/*.toml*` directory.

danintel (Wed, 06 Jun 2018 22:58:30 GMT):
I ran the script manually, `sudo bash -x ./validator/packaging/ubuntu/postinst configure` And it fixed the `/var/lib/sawtooth/` permissions to 0750. The only error was a chmod on the non-existent `/etc/sawtooth/*.toml*` file

amundson (Wed, 06 Jun 2018 22:58:32 GMT):
hmm, what is in /var/adm/sawtooth?

danintel (Wed, 06 Jun 2018 22:59:40 GMT):
So then I did `apt remove python3-sawtooth-sdk; apt install python3-sawtooth-sdk` and the permissions of `/var/adm/sawtooth`, newly recreated, is 755

danintel (Wed, 06 Jun 2018 23:00:04 GMT):
Here is the contents (empty), after reinstall: `# ls -la /var/lib/sawtooth total 8 drwxr-xr-x 2 sawtooth sawtooth 4096 Apr 30 12:26 . drwxr-xr-x 55 root root 4096 Jun 6 15:57 ..`

danintel (Wed, 06 Jun 2018 23:00:04 GMT):
Here is the contents (empty), after reinstall: `# ls -la /var/lib/sawtooth` `total 8` `drwxr-xr-x 2 sawtooth sawtooth 4096 Apr 30 12:26 .` `drwxr-xr-x 55 root root 4096 Jun 6 15:57 ..`

danintel (Wed, 06 Jun 2018 23:04:01 GMT):
Both /var/lib/sawtooth and /var/adm/sawtooth are 0755 and empty (after reinstall)

danintel (Wed, 06 Jun 2018 23:12:41 GMT):
This is a good demo for why the files, not just the containing directory, should have tightened permissions. Which is to have multiple layers of defense (in this case against some packaging issue).

amundson (Thu, 07 Jun 2018 17:11:55 GMT):
why would the sdk package make any difference?

bridgerherman (Thu, 07 Jun 2018 18:38:00 GMT):
Has joined the channel.

danintel (Thu, 07 Jun 2018 20:09:38 GMT):
Because the files are installed from package `python3-sawtooth-sdk`: `$ apt-get download python3-sawtooth-sdk Get:1 http://repo.sawtooth.me/ubuntu/1.0/stable xenial/universe amd64 python3-sawtooth-sdk all 1.0.4-1 [33.2 kB] $ dpkg --contents *.deb |grep /var/.../'$' drwxr-xr-x root/root 0 2018-04-30 12:26 ./var/lib/ drwxr-xr-x root/root 0 2018-04-30 12:26 ./var/log/` Your question implies there is another package or another post-install method to set the permissions. That may be the root problem. If the python3-sawtooth-sdk installs directories and another package installs the directories, and a `postinst` script or another script modifies the ownership/permissions, there may be a race condition here.

danintel (Thu, 07 Jun 2018 20:09:38 GMT):
Because the directories are installed from package `python3-sawtooth-sdk`: `$ apt-get download python3-sawtooth-sdk Get:1 http://repo.sawtooth.me/ubuntu/1.0/stable xenial/universe amd64 python3-sawtooth-sdk all 1.0.4-1 [33.2 kB] $ dpkg --contents *.deb |grep /var/.../'$' drwxr-xr-x root/root 0 2018-04-30 12:26 ./var/lib/ drwxr-xr-x root/root 0 2018-04-30 12:26 ./var/log/` Your question implies there is another package or another post-install method to set the permissions. That may be the root problem. If the python3-sawtooth-sdk installs directories and another package installs the directories, and a `postinst` script or another script modifies the ownership/permissions, there may be a race condition here.

danintel (Thu, 07 Jun 2018 20:12:08 GMT):
So what package runs the `postinst` script? I think it should be `python3-sawtooth-sdk` as this package installs the directories (at least it is in the package manifest).

danintel (Thu, 07 Jun 2018 23:31:13 GMT):
I also looked in the base `sawtooth_1.0.4_all.deb` Ubuntu package. Nothing there under `var/` (just `/usr/share/doc/sawtooth/`). Is that where `postinst` is ran? If so, it may be overwritten by the later install of `python3-sawtooth-sdk`

danintel (Thu, 07 Jun 2018 23:42:44 GMT):
Or maybe fixing the `var/{log,lib}/` directory perms were done post-1.0.4 release?

amundson (Fri, 08 Jun 2018 02:55:55 GMT):
I think python3-sawtooth-sdk is broken if it is doing anything with /var/{log,lib}

amundson (Fri, 08 Jun 2018 02:58:27 GMT):
``` diff --git a/sdk/python/setup.py b/sdk/python/setup.py index 799138b4..104478d1 100644 --- a/sdk/python/setup.py +++ b/sdk/python/setup.py @@ -15,28 +15,10 @@ from __future__ import print_function -import os import subprocess from setuptools import setup, find_packages - -if os.name == 'nt': - conf_dir = "C:\\Program Files (x86)\\Intel\\sawtooth\\conf" - data_dir = "C:\\Program Files (x86)\\Intel\\sawtooth\\data" - log_dir = "C:\\Program Files (x86)\\Intel\\sawtooth\\logs" -else: - conf_dir = "/etc/sawtooth" - data_dir = "/var/lib/sawtooth" - log_dir = "/var/log/sawtooth" - -data_files = [ - (conf_dir, []), - (os.path.join(conf_dir, "keys"), []), - (data_dir, []), - (log_dir, []), -] - setup( name='sawtooth-sdk', version=subprocess.check_output( @@ -45,7 +27,6 @@ setup( author='Hyperledger Sawtooth', url='https://github.com/hyperledger/sawtooth-core', packages=find_packages(), - data_files=data_files, install_requires=[ "colorlog", "sawtooth-signing", ```

amundson (Fri, 08 Jun 2018 02:58:39 GMT):
I didn't test that, but it probably fixes that issue

amundson (Fri, 08 Jun 2018 02:59:24 GMT):
@danintel ^

abraham (Fri, 08 Jun 2018 04:52:20 GMT):
Has joined the channel.

rberg2 (Fri, 08 Jun 2018 16:21:09 GMT):
I believe this commit to be the reason memory usage was so high in the 1-0-staging-01 branch were testing to become 1.0.5 https://github.com/hyperledger/sawtooth-core/commit/c4c07fb70627cc2cf442ce4d888d5adf9f7eccf5

danintel (Fri, 08 Jun 2018 18:47:31 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=Hs2e6CKYQSe225XXb) Solved: The `/var/???/sawtooth` permissions are fixed in the nightly packages (`rwxr-x---`), but only if the directory didn't previously and didn't have left over files from a previous install of the earlier 1.0.4.

Dan (Fri, 08 Jun 2018 19:35:34 GMT):
thanks @rberg2

Dan (Sun, 10 Jun 2018 19:48:47 GMT):
@pschwarz @adamludvik @amundson for the 1.0.5 issue, I think we either 1. Remove the commit above 2. Shorten `base_keep_time` 3. Set a size limit on both caches. Option 2 seems like tuning a bandaid - i.e. we may tune for a while still without full resolution. Option 3 seems like surgery on a bandaid - i.e. making this class a conjunction of a timed cache and a circularly linked list has a big chance of introducing new bugs. I'm inclined towards dropping the commit and listing the issue in the release notes as a known issue.

amundson (Mon, 11 Jun 2018 06:41:55 GMT):
@dan option 3 isn't practical as these objects are not pure caches

amundson (Mon, 11 Jun 2018 06:41:55 GMT):
@Dan option 3 isn't practical as these objects are not pure caches

tungdt_socoboy (Mon, 11 Jun 2018 10:50:58 GMT):
Hi everyone, I have a quite simple question but couldn't found an answer yet, hope someone can help me to answer this. The question is: I know that Ethereum smart contract could be supported in Sawtooth, I wondering what natively other kind of smart contract was develop in Sawtooth? Natively like Ethereum has Ethereum Solidity smart contract run on EVM, so what kind of smart contract (chain code) natively built on Sawtooth, able to run on Sawtooth? In general, smart contract was a chain code deployed and stored inside blockchain data, is it same as in Sawtooth? Smart contract was deployed (saved in blockchain) and cannot be changed?

tungdt_socoboy (Mon, 11 Jun 2018 10:51:00 GMT):
Thank you

Dan (Mon, 11 Jun 2018 14:09:44 GMT):
@tungdt_socoboy that's a good question for #sawtooth - there's a lot of people there that can help answer that question. :)

pschwarz (Mon, 11 Jun 2018 14:13:52 GMT):
It has been a known issue for quite some time, so it's not new - it's just that this fix doesn't correct it in the 1.0.x branch

pschwarz (Mon, 11 Jun 2018 14:14:28 GMT):
It's a bandaide in master, but that should be replaced in the future in completely unbackportable ways.

Dan (Mon, 11 Jun 2018 14:22:59 GMT):
Right. So my recommendation is we drop the commit from 1.0.5. Other option I can come up with is to tune the bandaid which seems like a waste of time.

pschwarz (Mon, 11 Jun 2018 14:23:28 GMT):
Yep

adamludvik (Mon, 11 Jun 2018 15:57:26 GMT):
@Dan Yeah, drop

markg 17 (Mon, 11 Jun 2018 22:36:12 GMT):
Has joined the channel.

tungdt_socoboy (Tue, 12 Jun 2018 07:35:26 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=w6bzqeapxKfMm5qX9) @Dan Thank you, I will forward it into the #sawtooth channel

Dan (Tue, 12 Jun 2018 16:17:52 GMT):
We should publish recommended system specs. (feedback from an HL discussion)

rjones (Tue, 12 Jun 2018 16:20:18 GMT):
the documentation never says "spinning rust is death"

pauljithink (Thu, 14 Jun 2018 01:39:47 GMT):
Has joined the channel.

rock_martin (Fri, 15 Jun 2018 13:04:49 GMT):
Has joined the channel.

zac (Mon, 18 Jun 2018 21:43:14 GMT):
:exclamation: PRs are up to remove the JS SDK from core :exclamation: https://github.com/hyperledger/sawtooth-core/pull/1730 https://github.com/hyperledger/sawtooth-sdk-javascript/pull/2

KevinODonnell (Tue, 19 Jun 2018 02:17:46 GMT):
Has joined the channel.

yadhuksp (Tue, 19 Jun 2018 05:11:47 GMT):
Has joined the channel.

Dan (Tue, 19 Jun 2018 19:53:59 GMT):
Guess what? We now have an SDK channel!! Yay let's all go over to #sawtooth-sdk-dev and talk about how we get all the SDKs out of core.

st (Wed, 20 Jun 2018 11:46:58 GMT):
Has joined the channel.

pyzhang (Thu, 21 Jun 2018 15:50:09 GMT):
Has joined the channel.

rjones (Fri, 22 Jun 2018 15:26:38 GMT):
Has left the channel.

neocameback (Tue, 26 Jun 2018 02:40:17 GMT):
Has joined the channel.

neocameback (Tue, 26 Jun 2018 02:46:49 GMT):
Hi everyone, i would like to understand why sawtooth use zeroMQ instead of other MQ like RabbitMQ or Kafka. What is the reason here, and should i change to use Kafka in my own project. Could anyone here give me some discussions?

amundson (Tue, 26 Jun 2018 06:59:43 GMT):
@neocameback speed and maturity of the various language bindings; no centralized or additional processes to run

neocameback (Tue, 26 Jun 2018 08:22:57 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=3PE2NQzu4Rex9GizE) @amundson Thanks for your sharing, could you please explain more about speed and maturity of language bindings. I quite clear for centralized and additional processes

amundson (Tue, 26 Jun 2018 14:08:14 GMT):
@neocameback in terms of speed, 0MQ is quite fast. we tested some others when we made the selection (a couple years ago) and 0MQ was consistent across languages. for example, grpc was horrifically slow and bad in python but fine in some other languages.

neocameback (Wed, 27 Jun 2018 02:18:48 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=anvjPbJYGaR5dRDyu) @amundson Oh, i see, thank you very much. It's important things to make decision of 0MQ

rootDistress (Wed, 27 Jun 2018 07:23:48 GMT):
Has joined the channel.

TheOnlyJoey (Wed, 27 Jun 2018 10:37:49 GMT):
Has joined the channel.

TheOnlyJoey (Wed, 27 Jun 2018 10:39:39 GMT):
Good day, i am currently writing up a module as replacement for the rest-api using socket.io for communication, but i have a bit of trouble finding where the actual calls to the blockchain are done in the rest API, is there a simple call API spec for the basic functions or could someone highlight where i can find the actual part sending batches or receiving data on the blockchain

TheOnlyJoey (Wed, 27 Jun 2018 10:50:14 GMT):
seems like those are raw socket calls? _socket.send_multipart seems to be part of what make things roll

TheOnlyJoey (Wed, 27 Jun 2018 10:58:29 GMT):
zmq?

TheOnlyJoey (Wed, 27 Jun 2018 10:58:29 GMT):
ah seems to use zmq?

amundson (Wed, 27 Jun 2018 13:30:37 GMT):
@TheOnlyJoey yes, 0MQ for underlying communication. messages sent are protobuf messages. look at protos/validator.proto and protos/client_*.proto

TheOnlyJoey (Wed, 27 Jun 2018 13:37:39 GMT):
yeah already figured out the protobuf part, actually thinking about just setting up a enet service to pingpong the protobuf strings from the blockchain to our backend

TheOnlyJoey (Wed, 27 Jun 2018 15:12:25 GMT):
Also does Sawtooth support signing a transaction without a predetermined address? want to use this to generate 'physical' tokens for easy distribution on our test network

mychewcents (Wed, 27 Jun 2018 19:09:30 GMT):
Has joined the channel.

pschwarz (Thu, 28 Jun 2018 07:41:19 GMT):
I would suggest asking that out on the #sawtooth , since that is more of a general usage question, as opposed to core platform development

TheOnlyJoey (Thu, 28 Jun 2018 08:59:50 GMT):
Fair enough, thanks

neewy (Thu, 28 Jun 2018 12:21:02 GMT):
Has joined the channel.

neewy (Thu, 28 Jun 2018 12:25:17 GMT):
Hi everyone, are there any docs available on implementation of transaction processor? I am wondering how api of Context class works, how does transaction processor takes data from Context class for validation?

kelly_ (Thu, 28 Jun 2018 14:48:39 GMT):
@amundson @TomBarnes @jsmitchell @Dan @agunde @adamludvik @pschwarz - Hey all, I wanted to see if we could have a root team call next week to discuss roadmap items for Sawtooth

kelly_ (Thu, 28 Jun 2018 14:49:42 GMT):
The idea would be to brainstorm and identify some larger scale features/improvements with some rough priority for implementation in the next 6-9 months

kelly_ (Thu, 28 Jun 2018 14:50:25 GMT):
Just general information sharing so that we can understand what folks are working on, what they see as highest priority improvements, etc.

kelly_ (Thu, 28 Jun 2018 14:51:04 GMT):
I'm free noon-2pm PST on Monday as one potential timeslot, but let me know what works well for y'all

zac (Thu, 28 Jun 2018 14:53:21 GMT):
@neewy https://sawtooth.hyperledger.org/docs/core/releases/latest/_autogen/sdk_TP_tutorial_python.html

neewy (Thu, 28 Jun 2018 14:54:56 GMT):
Zac, can you explain where I can find out more how Context works?

zac (Thu, 28 Jun 2018 15:34:32 GMT):
https://sawtooth.hyperledger.org/docs/core/releases/latest/sdks/python_sdk/processor.html#module-processor.context

zac (Thu, 28 Jun 2018 15:35:27 GMT):
There is a search bar in those docs to the left btw

neewy (Thu, 28 Jun 2018 15:43:31 GMT):
Wow great

neewy (Thu, 28 Jun 2018 15:44:08 GMT):
How did you generate these docs?

zac (Thu, 28 Jun 2018 15:45:49 GMT):
¯\_(ツ)_/¯

zac (Thu, 28 Jun 2018 15:45:58 GMT):
Some Python tool

zac (Thu, 28 Jun 2018 15:46:24 GMT):
Sphinx does the overall docs

zac (Thu, 28 Jun 2018 15:46:40 GMT):
I'm not sure what does the docs generated from Python source

zac (Thu, 28 Jun 2018 15:47:23 GMT):
https://github.com/hyperledger/sawtooth-core/blob/master/docs/Makefile#L67

neewy (Thu, 28 Jun 2018 15:47:33 GMT):
I see there is a sawtooth_signing package. Do you use it so that it provides an interface for cryptographic functionality for TPs?

zac (Thu, 28 Jun 2018 15:47:58 GMT):
yes, and the validator itself uses it

neewy (Thu, 28 Jun 2018 15:48:01 GMT):
and by default is that secp256k1?

zac (Thu, 28 Jun 2018 15:48:13 GMT):
yes, that is the only option at the moment

neewy (Thu, 28 Jun 2018 15:49:00 GMT):
I bet you've heard about collisions in SHA-512, right?

jsmitchell (Thu, 28 Jun 2018 16:22:07 GMT):
@kelly_ can you shoot @mfford an email? He can help coordinate schedules. Thx

mfford (Thu, 28 Jun 2018 16:22:09 GMT):
Has joined the channel.

kelly_ (Thu, 28 Jun 2018 16:23:15 GMT):
ok @jsmitchell will do

kelly_ (Thu, 28 Jun 2018 16:47:41 GMT):
also @jsmitchell @amundson I'm going to finally have Dan mule some Sawtooth shirts back to MN this weekend

kelly_ (Thu, 28 Jun 2018 16:47:55 GMT):
so hopefully he can drop those off in the office sometime next week

Dan (Thu, 28 Jun 2018 16:48:19 GMT):
I'm having trouble fitting the shirts in balloons.

kelly_ (Thu, 28 Jun 2018 16:51:14 GMT):
we're just going to tape them around your body

Dan (Thu, 28 Jun 2018 16:51:57 GMT):
Well I wish I had known that before I got the first 3 hidden.

kelly_ (Thu, 28 Jun 2018 16:52:08 GMT):
http://i.dailymail.co.uk/i/pix/2016/02/29/16/31B3BA0400000578-3469636-image-m-23_1456761968908.jpg

Dan (Thu, 28 Jun 2018 16:52:39 GMT):
nice

rohitkhatri (Fri, 29 Jun 2018 14:08:17 GMT):
Has joined the channel.

rohitkhatri (Fri, 29 Jun 2018 14:08:42 GMT):
I have setup a `Hyperlder Sawtooth Network` from the `Sawtooth Docs`, you can find `docker-compose.yaml` I used to setup the network here: https://sawtooth.hyperledger.org/docs/core/releases/1.0/app_developers_guide/sawtooth-default.yaml I'm running a custom `transaction processor`, and what's happening is after some successful transactions, the batch status is stuck on `PENDING` and when I check the logs of `validator`, there's always a entry says this: ```Unable to find entry at address 5f68a3afa88f4a92fc362957d4c87101c884c97f2fcf92acbd512a2d12ef9d5bee55ee``` And in my `transaction processor`, I'm doing `console.log` so I can check whether the `validator` is calling the `apply` function of my processor, but I don't get any logs. In brief, after some transactions, the validator is not calling the `apply` function of my `transaction processor`. If anybody has faced this issue, please give a hand.

pschwarz (Fri, 29 Jun 2018 14:17:38 GMT):
@rohitkhatri I would suggest that you ask in #sawtooth, which is more focused on app development, versus this channel which is for core development of the platform

rohitkhatri (Fri, 29 Jun 2018 14:18:04 GMT):
@pschwarz sure, thanks.

Dan (Fri, 29 Jun 2018 19:15:12 GMT):
Anyone seen this before? (via bin/run_tests -m cli) seems like the validator container is expecting some consensus settings that its not finding. Haven't tried this on jenkins yet but at least its failing on my mac. ```validator-1_1 | [2018-06-29 19:03:58.259 INFO path] Skipping path loading from non-existent config file: /etc/sawtooth/path.toml validator-1_1 | [2018-06-29 19:03:58.259 ERROR (unknown file)] error executing main validator-1_1 | Traceback (most recent call last): validator-1_1 | File "/project/sawtooth-core/validator/sawtooth_validator/server/cli.py", line 104, in main validator-1_1 | bind_consensus=args['bind_consensus'], validator-1_1 | KeyError: 'bind_consensus' validator-3_1 | [2018-06-29 19:03:58.342 DEBUG ffi] loading library libsawtooth_validator.so ```

TheOnlyJoey (Fri, 06 Jul 2018 08:35:00 GMT):
Hmm is there documentation regarding doing communication directly over ZMQ instead of using the rest-api with sawtooth core?

TheOnlyJoey (Fri, 06 Jul 2018 08:35:18 GMT):
don't know if there is a small module somewhere for that

TheOnlyJoey (Fri, 06 Jul 2018 08:49:48 GMT):
basically all i want is to pass the serialized protobuf string from my backend directly to the blockchain without the rest-api

TheOnlyJoey (Fri, 06 Jul 2018 11:15:13 GMT):
the rest-api module is a bit confusing, a _lot_ of boilerplate

pschwarz (Fri, 06 Jul 2018 14:51:12 GMT):
@Dan What branch are you on?

Dan (Fri, 06 Jul 2018 14:51:43 GMT):
that was master a few days ago

pschwarz (Fri, 06 Jul 2018 14:51:44 GMT):
@TheOnlyJoey There is only documentation on using ZMQ directly for event subscriptions

pschwarz (Fri, 06 Jul 2018 14:52:11 GMT):
Look here for some info: https://sawtooth.hyperledger.org/docs/core/nightly/master/app_developers_guide/event_subscriptions.html

pschwarz (Fri, 06 Jul 2018 14:53:40 GMT):
Otherwise, you'll have to look at the protobuf messages themselves in the repo `sawtooth-core/protos`. The various client protobuf definitions will be the interesting ones for you

Dan (Fri, 06 Jul 2018 14:55:01 GMT):
@pschwarz I think that was a red herring on the cli failure. i already pushed what I was doing to jenkins and it passed and was subsequently merged.

pschwarz (Fri, 06 Jul 2018 14:55:11 GMT):
Huh

pschwarz (Fri, 06 Jul 2018 14:55:27 GMT):
Well, clearly, I am behind on my :rocket: messages

kelly_ (Mon, 09 Jul 2018 23:49:02 GMT):
@boydjohnson @dplumb @Dan - T-mobile recently hired someone to drive the NEXT-Directory development forward

kelly_ (Mon, 09 Jul 2018 23:49:26 GMT):
his name is Richard, github username 'yunhangc'

kelly_ (Mon, 09 Jul 2018 23:50:36 GMT):
can we add him to the repo? also @Dan I noticed this PR has been approved but hasn't been merged - https://github.com/hyperledger/sawtooth-next-directory/pull/10

Dan (Mon, 09 Jul 2018 23:53:31 GMT):
I thought Mike at write access but maybe not. I just merged it.

Dan (Mon, 09 Jul 2018 23:54:39 GMT):
As far as adding yunhangc I think the usual procedure is that he needs to be added the the hyperledger group and then we can add him to a specific repo or team.

kelly_ (Mon, 09 Jul 2018 23:54:42 GMT):
great, thanks @Dan

kelly_ (Mon, 09 Jul 2018 23:55:42 GMT):
who needs to add him to HL?

kelly_ (Mon, 09 Jul 2018 23:55:47 GMT):
Ry?

kelly_ (Mon, 09 Jul 2018 23:56:03 GMT):
or can he add himself?

Dan (Mon, 09 Jul 2018 23:56:28 GMT):
We do need to get the maintainers formalized for that repo into a MAINTAINERS file. Then the policy can be straight forward like the other repos. i.e. the existing maintainers can add in new maintainers.

kelly_ (Mon, 09 Jul 2018 23:56:54 GMT):
ok, I know Edan has improvements to push too I believe

Dan (Mon, 09 Jul 2018 23:57:12 GMT):
@rjones can you remind me how we add people to github teams as far as getting new IDs in to the hyperledger groups?

rjones (Mon, 09 Jul 2018 23:57:12 GMT):
Has joined the channel.

Dan (Mon, 09 Jul 2018 23:57:26 GMT):
I think anyone can put up a PR. That is we don't need to do anything with git permissions for PRs - I mean that separately from getting people with review/write access onboarded.

Dan (Mon, 09 Jul 2018 23:57:26 GMT):
I think anyone can put up a PR.

rjones (Tue, 10 Jul 2018 00:07:38 GMT):
@dan they need to be invited to the main group, then projects/etc can freely add or remove roles

rjones (Tue, 10 Jul 2018 00:07:58 GMT):
so send a ticket to helpdesk@hyperledger.org with github IDs and invites will be sent

Dan (Tue, 10 Jul 2018 00:11:13 GMT):
@ChrisSpanton ^

ChrisSpanton (Tue, 10 Jul 2018 00:11:13 GMT):
Has joined the channel.

ChrisSpanton (Tue, 10 Jul 2018 00:11:48 GMT):
Love it! Thanks guys

kelly_ (Tue, 10 Jul 2018 00:13:25 GMT):
FYI - https://chat.hyperledger.org/channel/sawtooth-next-directory

ChrisSpanton (Tue, 10 Jul 2018 00:14:08 GMT):
@kelly_ :woo:

rjones (Tue, 10 Jul 2018 00:15:28 GMT):
It's @Dan not @Dan. @Dan is some other guy, @kelly_ :)

rjones (Tue, 10 Jul 2018 00:15:48 GMT):
case tenderness helps nobody :(

kelly_ (Tue, 10 Jul 2018 00:21:25 GMT):
ah yes, I was wondering why I was having problems tagging him

shannynalayna (Tue, 10 Jul 2018 15:52:46 GMT):
Has joined the channel.

danintel (Wed, 11 Jul 2018 17:43:38 GMT):
Hmmm. Maybe I should change my handle to @dan. That will really confuse things :robot:

Dan (Thu, 12 Jul 2018 13:31:38 GMT):
That would be hilarious :grinning:

zac (Thu, 12 Jul 2018 14:26:47 GMT):
This PR for the JS SDK is in permanent "waiting for DCO status" mode: https://github.com/hyperledger/sawtooth-sdk-javascript/pull/2

zac (Thu, 12 Jul 2018 14:27:04 GMT):
I am told either @Dan or @rjones has a fix for this

rjones (Thu, 12 Jul 2018 14:28:41 GMT):
holy cow, I'm not going to review all those commits by hand. @Dan: you tell me to merge it I will do it on your say-so

Dan (Thu, 12 Jul 2018 14:28:41 GMT):
I don't have a fix, but I was copying them as bug cases to a dco github issue - which I think has since been closed

Dan (Thu, 12 Jul 2018 14:28:57 GMT):
Yes please merge.

rjones (Thu, 12 Jul 2018 14:28:57 GMT):
in the past it's been one or two commits to look at.

rjones (Thu, 12 Jul 2018 14:29:21 GMT):
please make that comment on the PR :)

Dan (Thu, 12 Jul 2018 14:29:29 GMT):
That PR is grabbing a chunk of sawtooth core and putting it in a new repo that's why there's so many commits.

Dan (Thu, 12 Jul 2018 14:29:33 GMT):
roger wilco

zac (Thu, 12 Jul 2018 14:30:40 GMT):
I believe it has passed DCO in the past

zac (Thu, 12 Jul 2018 14:31:17 GMT):
Though I don't remember for sure

Dan (Thu, 12 Jul 2018 14:32:39 GMT):
You didn't add any new commits though. That was just picking up commits previously merged to core right?

zac (Thu, 12 Jul 2018 14:35:11 GMT):
I did add new commits

zac (Thu, 12 Jul 2018 14:35:16 GMT):
A couple dozen

zac (Thu, 12 Jul 2018 14:35:24 GMT):
They are all signed except for the merge commit

zac (Thu, 12 Jul 2018 14:35:45 GMT):
(I just double checked)

rjones (Thu, 12 Jul 2018 14:39:02 GMT):
what's done is done :)

Dan (Thu, 12 Jul 2018 14:43:21 GMT):
it's perfect

zac (Thu, 12 Jul 2018 14:45:45 GMT):
@Dan Assuming I can get the remove PR in core to pass its build, are we good to go ahead and merge that now? https://github.com/hyperledger/sawtooth-core/pull/1730

Dan (Thu, 12 Jul 2018 14:47:29 GMT):
meaning that now that js sdk is in it's own repo can we remove it from core?

Dan (Thu, 12 Jul 2018 14:49:47 GMT):
I assume yes, and I've approved #1730.

zac (Thu, 12 Jul 2018 14:54:16 GMT):
Yeah

zac (Thu, 12 Jul 2018 14:54:39 GMT):
It has been approved for a little while, I just want to get final buy in from the various product owners before I actually pull the trigger

benoit.razet (Thu, 12 Jul 2018 16:49:32 GMT):
Has joined the channel.

sidhujag (Thu, 12 Jul 2018 18:58:32 GMT):
Has joined the channel.

PHeinz (Thu, 12 Jul 2018 18:58:37 GMT):
Has joined the channel.

FrankCastellucci (Thu, 12 Jul 2018 20:16:02 GMT):
Has joined the channel.

FrankCastellucci (Thu, 12 Jul 2018 20:45:42 GMT):
@jsmitchell Perhaps 'generational' wasn't the right term as it can potentially be more complex. Datomic is a perfect implementation of the concept whereas an identifiable data element (with a key) is never actually modified but, instead, the original 'version' is stored and a new 'version' occupies the key location. In addition, and references to the original are maintained as that represented the reference state at the time of creation.

FrankCastellucci (Thu, 12 Jul 2018 20:45:42 GMT):
@jsmitchell Perhaps 'generational' wasn't the right term as it can potentially be more complex. Datomic is a perfect example of the concept whereas an identifiable data element (with a key) is never actually modified but, instead, the original 'version' is stored and a new 'version' occupies the key location. In addition, and references to the original are maintained as that represented the reference state at the time of creation.

FrankCastellucci (Thu, 12 Jul 2018 20:46:41 GMT):
https://docs.datomic.com/cloud/whatis/data-model.html

jsmitchell (Thu, 12 Jul 2018 21:17:26 GMT):
well, that is somewhat like how the copy-on-write works in the merkle trie

jsmitchell (Thu, 12 Jul 2018 21:17:44 GMT):
we are storing all those versions (at least until they are pruned)

FrankCastellucci (Fri, 13 Jul 2018 00:48:39 GMT):
Can they be accessed? Do they have any meta-data associated with them?

RealDeanZhao (Fri, 13 Jul 2018 02:22:17 GMT):
Has joined the channel.

jsmitchell (Fri, 13 Jul 2018 13:59:21 GMT):
yes, they can be accessed via prior state root hashes

benoit.razet (Fri, 13 Jul 2018 14:06:06 GMT):
from a user's perspective, through the rest-pi it's possible to get the blocks, which include the `state_root_hash` in its protobuff. So it's not too difficult to have a one-to-one mapping between the block ids and state_root_hash

benoit.razet (Fri, 13 Jul 2018 14:06:36 GMT):
the rest-api allows to retrieves data from the merkle trie based on a block id

benoit.razet (Fri, 13 Jul 2018 14:07:06 GMT):
with the `/state` endpoint

FrankCastellucci (Fri, 13 Jul 2018 14:56:10 GMT):
@jsmitchell Is pruning automatic or configurable?

jsmitchell (Fri, 13 Jul 2018 14:56:38 GMT):
it's configurable by depth, i think

jsmitchell (Fri, 13 Jul 2018 14:56:56 GMT):
@pschwarz is the prunester, but he is on vacation until monday

FrankCastellucci (Fri, 13 Jul 2018 14:59:42 GMT):
OK, I'll rummage... the ideal is an endpoint that given an address returns it's history (like an audit) but something that can guarantee in an audit scenario is one use-case...

rjones (Fri, 13 Jul 2018 21:13:27 GMT):
@all : 2FA will soon be required across all GitHub orgs. If you do not have 2FA enabled for your account, you will be automatically removed from the Hyperledger org and will need to be re-added. Please check to ensure you have 2FA enabled. https://help.github.com/articles/securing-your-account-with-two-factor-authentication-2fa/

pidof (Sat, 14 Jul 2018 17:37:37 GMT):
Has joined the channel.

pidof (Sat, 14 Jul 2018 17:38:12 GMT):
ry love that policy ...

pschwarz (Mon, 16 Jul 2018 14:10:26 GMT):
@FrankCastellucci State pruning is configured via the `--state-pruning-block-depth` cli flag or `state_pruning_block_depth` configuration setting in your validator toml. It defaults to 100.

FrankCastellucci (Mon, 16 Jul 2018 14:31:38 GMT):
@pschwarz So, is that the # of changes to a particular addresses data to support? If so, what happens when it is exceeded?

pschwarz (Mon, 16 Jul 2018 15:21:07 GMT):
No, pruning occurs at block boundaries

pschwarz (Mon, 16 Jul 2018 15:21:56 GMT):
So, a state root hash that is below that depth will be pruned, which means all values in the state tree that are no longer referenced by future trees will be deleted

kodonnel (Mon, 16 Jul 2018 15:33:48 GMT):
Has joined the channel.

FrankCastellucci (Mon, 16 Jul 2018 16:12:04 GMT):
@pschwarz Thanks for clarifying

benoit.razet (Tue, 17 Jul 2018 13:13:14 GMT):
Very interesting thread @pschwarz @FrankCastellucci . Out of curiosity, does it mean that, in the case where the content at a specific address (in the merkle tree) is changed several times by the transactions in a single block, only the last change would be retrievable, and all the intermediate changes non retrievable (once the block is further the `state_pruning_block_depth`)?

jsmitchell (Tue, 17 Jul 2018 13:51:59 GMT):
@benoit.razet the only thing that hits state is the aggregate(final) set of address changes due to the transactions in the block. if multiple transactions in a single block modify an address, there will only be one 'set'. You could see the transaction level changes in the receipts if you needed to. Otherwise, writing timestamped history to a portion of state is an option.

benoit.razet (Tue, 17 Jul 2018 14:06:20 GMT):
thanks @jsmitchell for the answer and reminding the timestamped history option

benoit.razet (Tue, 17 Jul 2018 14:12:06 GMT):
If a specific property (auditability, security, etc) has to be designed for a Dapp, it's always interesting to understand the fundamentals of sawtooth-core to be able to understand how this property can be ensured using sawtooth-core features or at the application level, or a mix of both. Thanks for the feedback, always insightful

FrankCastellucci (Tue, 17 Jul 2018 14:27:12 GMT):
What is the `timestamped history`? A TP provided piece of info?

jsmitchell (Tue, 17 Jul 2018 14:37:07 GMT):
yeah, could be a thing that the tp decides to write to state as part of processing transactions

Dan (Tue, 17 Jul 2018 14:40:05 GMT):
I thought that maybe meant the client submitting timestamps as part of the txn fields in like in supply chain. If the TP is going to do it then I assume it has to use the blockinfo TP pattern.

jsmitchell (Tue, 17 Jul 2018 14:40:37 GMT):
well, the transaction could include a client submitted timestamp

jsmitchell (Tue, 17 Jul 2018 14:40:55 GMT):
but the transaction doesn't need to contain anything special for the 'log entry'

jsmitchell (Tue, 17 Jul 2018 14:45:38 GMT):
for example, in intkey, you could modify the transaction processor to write an ordered list of things that the transactions have done to that intkey address

FrankCastellucci (Tue, 17 Jul 2018 14:48:26 GMT):
I assume that a TP written timestamp would fail the hash compare (validator) as it would be different for each invocation of the TP on a single transaction, no?

jsmitchell (Tue, 17 Jul 2018 14:52:17 GMT):
yes, the tp can't just invent the timestamp

jsmitchell (Tue, 17 Jul 2018 14:52:39 GMT):
it either needs to be from the blockinfo state location (a la seth), or from the transaction

benoit.razet (Tue, 17 Jul 2018 14:59:32 GMT):
If the complete history of a state can't be retrieved from a validator, then include this history in the state at the tp level

rjones (Wed, 18 Jul 2018 17:23:00 GMT):
@Dan we need the update for Sawtooth: https://wiki.hyperledger.org/groups/tsc/project-updates/sawtooth-2018-jul prior to the TSC call

Dan (Wed, 18 Jul 2018 18:08:13 GMT):
I'm sure one of the other maintainers would love to do that sometime ;) Thanks for the ping @rjones I'll get that together this afternoon.

Dan (Wed, 18 Jul 2018 18:08:13 GMT):
I'm sure one of the other maintainers would love to do that sometime ;) Thanks for the ping @Ry I'll get that together this afternoon.

Dan (Wed, 18 Jul 2018 18:08:13 GMT):
I'm sure one of the other maintainers would love to do that sometime ;) Thanks for the ping @rjones y I'll get that together this afternoon.

Dan (Wed, 18 Jul 2018 18:54:06 GMT):
@here maintainers let me know if there are any issues you would like raised with the TSC. In the past there were comments from some about feeling the TSC and projects were disconnected from one another. Anything like that. I don't think @amundson is around but he had mentioned recently about being blocked from editing the project page.

Dan (Wed, 18 Jul 2018 21:21:56 GMT):
Here's what I've got so far... received feedback from Kelly but no-one else (yeah kind of short notice, but I'm sure everyone is watching the TSC updates and new this was due thurs ;) )

Dan (Wed, 18 Jul 2018 21:21:58 GMT):
https://wiki.hyperledger.org/groups/tsc/project-updates/sawtooth-2018-jul?&#additional_information

jsmitchell (Wed, 18 Jul 2018 21:51:57 GMT):
@Dan s/our latest bug fix/our latest bug fix release/

jsmitchell (Wed, 18 Jul 2018 21:52:13 GMT):
s/given increase submissions/given increased submissions/

benoit.razet (Thu, 19 Jul 2018 01:09:46 GMT):
I've a couple of PR recently related to the python->rust replacement of `block_info` and `sawtooth_settings`. Is the new Rust implementation of these TPs backward compatible with the retired Python version?

benoit.razet (Thu, 19 Jul 2018 01:09:46 GMT):
I've seen a couple of PR recently related to the python->rust replacement of `block_info` and `sawtooth_settings`. Is the new Rust implementation of these TPs backward compatible with the retired Python version?

praspadm (Thu, 19 Jul 2018 05:57:31 GMT):
Has joined the channel.

johnfranklin (Thu, 19 Jul 2018 06:10:05 GMT):
Has joined the channel.

jsmitchell (Thu, 19 Jul 2018 13:21:08 GMT):
@benoit.razet that's definitely the intent

jeffhoekman (Thu, 19 Jul 2018 15:38:59 GMT):
Has joined the channel.

Dan (Thu, 19 Jul 2018 15:42:35 GMT):
Had a few comments from different sources that backpressure is appearing too aggressive. I don't have concrete case / repro yet, but enough anecdotes that I wanted to share sooner than later.

johnsourour (Thu, 19 Jul 2018 17:13:40 GMT):
Has joined the channel.

FrankCastellucci (Thu, 19 Jul 2018 19:59:39 GMT):
It doesn't appear that in the python `sawtooth-sdk` that the protobufs for SettingPayload, SettingProposal, et. al. are packaged.

FrankCastellucci (Thu, 19 Jul 2018 19:59:39 GMT):
It doesn't appear that in the python `sawtooth-sdk 1.0.4` that the protobufs for SettingPayload, SettingProposal, et. al. are packaged.

FrankCastellucci (Thu, 19 Jul 2018 19:59:39 GMT):
It doesn't appear that in the python `sawtooth-sdk 1.0.4` that the protobufs for SettingPayload, SettingProposal, et. al. are *not* packaged.

FrankCastellucci (Thu, 19 Jul 2018 19:59:46 GMT):
am I missing something?

FrankCastellucci (Thu, 19 Jul 2018 20:02:56 GMT):
It has the singular 'setting' but not the one this one: https://github.com/hyperledger/sawtooth-core/tree/master/families/settings/protos

FrankCastellucci (Thu, 19 Jul 2018 20:02:56 GMT):
It has the singular 'setting' but not this one: https://github.com/hyperledger/sawtooth-core/tree/master/families/settings/protos

FrankCastellucci (Thu, 19 Jul 2018 20:22:40 GMT):
I'll create a subset in our app but curious if A) Was it left out for a reason and B) Should I open a defect for that?

FrankCastellucci (Thu, 19 Jul 2018 20:22:55 GMT):
I hate emojii

jsmitchell (Thu, 19 Jul 2018 20:44:54 GMT):
@Dan on master or 1.0.4?

FrankCastellucci (Thu, 19 Jul 2018 20:55:10 GMT):
@jsmitchell Using the pip installed 1.0.4 sdk

FrankCastellucci (Thu, 19 Jul 2018 21:03:09 GMT):
`pip show sawtooth-sdk`

FrankCastellucci (Thu, 19 Jul 2018 21:03:29 GMT):
Lists the protobufs from the main `/protos` only

jsmitchell (Thu, 19 Jul 2018 21:09:46 GMT):
@FrankCastellucci sorry, different topics - i was asking @dan about the backpressure comment

Dan (Thu, 19 Jul 2018 21:35:27 GMT):
I believe 1.0.4. @grkvlt I think one of the "too much" backpressure anecdotes came from your company. Do you happen to know what I'm talk about?

kirkwood (Mon, 23 Jul 2018 04:01:04 GMT):
Has joined the channel.

zath (Wed, 25 Jul 2018 07:22:18 GMT):
Has joined the channel.

Dan (Wed, 25 Jul 2018 14:23:52 GMT):
@amundson fyi, crypto-lib discussion on the simple signer interface.. writeup: https://docs.google.com/document/d/1BvAXUGR6Gur12yEPbqCegiAuOChiZqEMp6CO3z8RxMk/edit#

amundson (Wed, 25 Jul 2018 14:43:49 GMT):
@Dan ok, thought that was DOA, didn't realize there was actually anything going on there

amundson (Wed, 25 Jul 2018 14:44:31 GMT):
is the proposal to start from scratch and design yet-another-api instead of extending/generalizing the sawtooth approach?

Dan (Wed, 25 Jul 2018 14:57:34 GMT):
Maybe I should have just 'at'-ed you over on the crypto-lib channel so we could have the discussion there. it looks similar to what we have but doesn't yet specify the constructor/factory.

amundson (Wed, 25 Jul 2018 15:08:47 GMT):
yeah, I'm interested in a discussion of iterating on what we have, but not necessarily debating an API from scratch. we did that, remember? not easy, even when we all basically agree. I do wish we would have written up all our opposing requirements more formally.

adeebahmed (Wed, 25 Jul 2018 21:35:52 GMT):
Has joined the channel.

Dan (Thu, 26 Jul 2018 17:42:57 GMT):
@dhuseby was just mentioning alpine linux to me on another thread. supposed to be tiny. i.e. smaller dockers. Maybe as people are developing new components it might be worth trying your new environment on alpine first before defaulting to ubuntu.

dhuseby (Thu, 26 Jul 2018 17:42:57 GMT):
Has joined the channel.

sjqnn (Fri, 27 Jul 2018 16:39:53 GMT):
Has joined the channel.

zZz (Sat, 28 Jul 2018 09:16:17 GMT):
Has joined the channel.

benoit.razet (Mon, 30 Jul 2018 16:03:29 GMT):
Hi, is it normal behavior that a validator cannot process anymore transactions after he receives a transaction with an invalid `family_name` in the protobuf of the transaction (like `intkeykey` instead of `intkey`) ?

jsmitchell (Mon, 30 Jul 2018 17:26:37 GMT):
sounds like a bug

benoit.razet (Mon, 30 Jul 2018 17:31:15 GMT):
@jsmitchell ok, I filed this ticket https://jira.hyperledger.org/browse/STL-1373

jsmitchell (Mon, 30 Jul 2018 17:40:43 GMT):
wait, are you restricting the transaction families in settings?

jsmitchell (Mon, 30 Jul 2018 17:41:28 GMT):
if not, this probably looks like a new transaction type that the validator just doesn't have a registered transaction processor for yet, and it will pause until one connects.

benoit.razet (Mon, 30 Jul 2018 17:43:06 GMT):
I thought about that. is this intended? sounds a little bit risky to me to block all the other legit transactions until a tp would connect because what if it does not

jsmitchell (Mon, 30 Jul 2018 17:45:19 GMT):
We could potentially change the behavior for publishing (inclusion of an unknown transaction), but I think the behavior needs to be as is for block validation, because how would you make progress otherwise (the alternative is a forked network?). It's good practice to use that setting in any case.

benoit.razet (Mon, 30 Jul 2018 18:16:22 GMT):
that makes sense for block validation, because at least one validator has been able to execute the transaction. Do you happen to know if using the setting disable this possibility o running into the issue?

benoit.razet (Mon, 30 Jul 2018 18:26:49 GMT):
for the record, for folks running into a similar issue, restarting the validators fixes the problem.

benoit.razet (Mon, 30 Jul 2018 19:08:17 GMT):
@jsmitchell I tried declaring the `transaction_families` with sawtooth_settings and does not prevent the bug: sending a transaction with no corresponding TP still stop the validator from processing any further transaction. I updated STL-1373 with this info

jsmitchell (Mon, 30 Jul 2018 19:17:47 GMT):
ok, thanks @benoit.razet

FrankCastellucci (Mon, 30 Jul 2018 19:58:14 GMT):
@benoit.razet Good catch and good to know, I thought it would continue to process other txns with legit families registered... didn't realize it gums up the works for all

Johnjam (Tue, 31 Jul 2018 06:24:24 GMT):
I had the same issue, but adding family name and version in sawtooth.validator.transaction_families during the bootstrap solved it with a 'failing transaction...since it isn't required in the configuration'

benoit.razet (Tue, 31 Jul 2018 12:44:52 GMT):
@Johnjam thanks! I take it back, I had not loaded the batch containing the `sawtooth.validator.transaction_families` properly. @FrankCastellucci @jsmitchell

Johnjam (Tue, 31 Jul 2018 13:33:20 GMT):
@benoit.razet I tested with more than one batch at a time and it freezes sometimes. I'm in detached HEAD (commit b3e30e8c7828daff8049551e21a25a426a4d03e8) and when I send 10 batches with an invalid transaction and then 10 batches with a valid transaction afterwards, I have your bug. I don't know if it's already resolved in a newer version of master.

benoit.razet (Tue, 31 Jul 2018 15:08:30 GMT):
I can totally be wrong but if the `sawtooth.validator.transaction_families` is not used then it freezes with 1.0.4 and master

benoit.razet (Tue, 31 Jul 2018 15:08:30 GMT):
I can totally be wrong but if the `sawtooth.validator.transaction_families` is not used then it freezes with 1.0.4 and a master I pulled last week

Johnjam (Tue, 31 Jul 2018 15:14:42 GMT):
In my tests, this configuration was set. I'll try again with the last master when I'll figure out how to setup the new architecture.

zZz (Tue, 31 Jul 2018 16:07:47 GMT):
Help, I have an unknown, the main purpose of initializing a lot of thread pool like component_thread_pool, network_thread_pool, client_thread_pool, sig_pool in sawtooth 1.0?

zZz (Tue, 31 Jul 2018 16:07:47 GMT):
I have an unknown, the main purpose of initializing a lot of thread pool like component_thread_pool, network_thread_pool, client_thread_pool, sig_pool in sawtooth 1.0.

jsmitchell (Tue, 31 Jul 2018 16:17:02 GMT):
Reminder - general questions should go to #sawtooth - this channel is for core development discusisons

jsmitchell (Tue, 31 Jul 2018 16:17:02 GMT):
Reminder - general questions should go to #sawtooth - this channel is for core development discussions

zZz (Tue, 31 Jul 2018 16:18:10 GMT):
thanks you

diegos (Wed, 01 Aug 2018 16:30:01 GMT):
Has joined the channel.

diegos (Wed, 01 Aug 2018 19:59:42 GMT):
Hi, I was going through the smallbank golang example, playing a little with it found a couple of things and want to make sure is the right behavior: 1) If any place in the transaction processor something returns any error different than InvalidTransactionError the transaction is resend to the TP by the validator again and again until it is successful or returns InvalidTransactionError. 2) Because of 1) if the transaction being processed has 2 legs (for example debiting one account first and crediting a second account, like saveAccount(new_source_account, context) and saveAccount(new_dest_account, context)), if the second leg fails with an error different than InvalidTransactionError, when the validator sends the transaction again, the first leg was already debited in the first try, so it's debited again and again until the second leg passes or you are out of funds on the first account :-) Is this right? If so, I think is very important to be very defensive on the transaction processor code to catch anything different than InvalidTransactionError or to totally avoid doing legs on the TP and instead doing 2 transactions inside a batch to be atomic.

diegos (Wed, 01 Aug 2018 19:59:42 GMT):
Hi, I was going through the smallbank golang example, playing a little with it found a couple of things and want to make sure is the right behavior: 1) If any place in the transaction processor something returns any error different than InvalidTransactionError the transaction is resend to the validator again and again until it is successful or returns InvalidTransactionError. 2) Because of 1) if the transaction being processed has 2 legs (for example debiting one account first and crediting a second account, like saveAccount(new_source_account, context) and saveAccount(new_dest_account, context)), if the second leg fails with an error different than InvalidTransactionError, when the validator sends the transaction again, the first leg was already debited in the first try, so it's debited again and again until the second leg passes or you are out of funds on the first account :-) Is this right? If so, I think is very important to be very defensive on the transaction processor code to catch anything different than InvalidTransactionError or to totally avoid doing legs on the TP and instead doing 2 transactions inside a batch to be atomic.

Dan (Wed, 01 Aug 2018 22:06:06 GMT):
No intermediate transactions commit in the datastore.

Dan (Wed, 01 Aug 2018 22:06:41 GMT):
Each time the transaction is resubmitted to the TP it will have a fresh context of state.

huy.nguyen (Thu, 02 Aug 2018 03:34:17 GMT):
Has joined the channel.

diegos (Thu, 02 Aug 2018 19:56:17 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=7NcNmsdm4NfbYcDLK) @Dan did not test this issue with v1.0.5, but with v1.0.4, I fire an Internal error or an unknown error on purpose when doing the second leg, and when the TP is called again the context state is not fresh, it has already the first leg applied, so when the TP processed it successfully on the second time, now the transaction is committed to the datastore but with the first account debited twice

jsmitchell (Thu, 02 Aug 2018 19:57:07 GMT):
@diegos 7 that sounds extremely unlikely

jsmitchell (Thu, 02 Aug 2018 19:57:40 GMT):
do you have any logs showing this behavior?

diegos (Thu, 02 Aug 2018 20:03:55 GMT):
:-) yes it was shocking for me, going to run it again to capture the logs. Actually this issue triggers me to review all my code to catch any error and found that the intkey-tp-go is not catching the CBOR errors. where is the best place to post that? here? jira? or github?

kelly_ (Thu, 02 Aug 2018 21:03:58 GMT):
https://www.hyperledger.org/blog/2018/08/02/from-xos-to-crypto-assets

kelly_ (Thu, 02 Aug 2018 21:04:03 GMT):
nice work everyone! ^

diegos (Thu, 02 Aug 2018 21:22:24 GMT):

Clipboard - August 2, 2018 6:22 PM

diegos (Thu, 02 Aug 2018 21:23:06 GMT):

Clipboard - August 2, 2018 6:22 PM

diegos (Thu, 02 Aug 2018 21:24:17 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=oRbNnPHoiAu82wdJq) this is v1.0.4, latter going to try the same on v.1.0.5

diegos (Thu, 02 Aug 2018 21:40:38 GMT):
on v1.0.5 is the same behavior

jsmitchell (Thu, 02 Aug 2018 21:40:42 GMT):
@diegos 7 log the transaction signature in your TP

jsmitchell (Thu, 02 Aug 2018 21:41:04 GMT):
on those Error Arriba lines

diegos (Thu, 02 Aug 2018 21:41:12 GMT):
ok

diegos (Thu, 02 Aug 2018 22:04:40 GMT):
the request signature is already in the log, on these lines: [DEBUG] exosp txn 5dda534650110d95554620a760898b972b53d15d8f3f3eaffb2919e58c37089667bd19754c9a1de5cdd9ea9eb67167b1d773a1da2e2646dad6a55c80571f0c50

jsmitchell (Thu, 02 Aug 2018 22:06:07 GMT):
it looks like it is running the same transaction multiple times during block construction. That should never happen.

diegos (Thu, 02 Aug 2018 22:08:07 GMT):
if the TP return any error diferrent than InvalidTransactionError, it runs again and again, actually if it is never successfull it loops ferever, hanging the validator

jsmitchell (Thu, 02 Aug 2018 22:09:20 GMT):
yes, that is by design

jsmitchell (Thu, 02 Aug 2018 22:09:32 GMT):
if your transaction is invalid, you need to return InvalidTransactionError

diegos (Thu, 02 Aug 2018 22:09:51 GMT):
correct, lern that the hard way :-)

diegos (Thu, 02 Aug 2018 22:11:50 GMT):
the problem is thata glich could happend, and if the transaction has more than one leg it's not atomic

jsmitchell (Thu, 02 Aug 2018 22:12:22 GMT):
there are lots of ways a bad transaction processor can cause non-determinism

jsmitchell (Thu, 02 Aug 2018 22:14:12 GMT):
What did the first execution of that transaction result in? Success, an InternalError, or an InvalidTransaction?

diegos (Thu, 02 Aug 2018 22:16:10 GMT):
It's fails 2 times on purpose sending and unknown error (fmt.Errorf("something"), also has the same behavior if I send InternalError

jsmitchell (Thu, 02 Aug 2018 22:16:15 GMT):
do you still have that system running?

jsmitchell (Thu, 02 Aug 2018 22:16:48 GMT):
it would be interesting to see the raw block contents of the block which starts 0d72fe

diegos (Thu, 02 Aug 2018 22:18:51 GMT):
Have all in docker-compose. Could start from zero, now I did the test using v1.0.5, I could run it from zero, capturing the new log and the blocks.

jsmitchell (Thu, 02 Aug 2018 22:20:37 GMT):
ok, here is my suspicion -- the block only contains the transaction once. It is being rerun three times determinstically during both block publishing and block validation because of what you are doing with returning InternalError. That all makes sense. What does not make sense is that the second and third invocations of the transaction during publishing seem to start from an invalid base context.

jsmitchell (Thu, 02 Aug 2018 22:20:49 GMT):
@boydjohnson @pschwarz ^

jsmitchell (Thu, 02 Aug 2018 22:23:08 GMT):
possible incorrect behavior in context manager on failed transaction

diegos (Thu, 02 Aug 2018 22:23:15 GMT):
Yes, on the log you could see the starting balance of each new try ( DEC balance: 9000 amount: 1000 newbalance: 8000)

jsmitchell (Thu, 02 Aug 2018 22:23:36 GMT):
yes, i understand. This is totally weird.

diegos (Thu, 02 Aug 2018 22:30:21 GMT):
the intkey-tp-go is not catching the CBOR errors. where is the best place to post that? here? jira? or github?

pschwarz (Fri, 03 Aug 2018 00:02:30 GMT):
No, it will retry Internal error transactions, just with a back off - the expectation is that the TP needs to restart

lcinacio (Sat, 04 Aug 2018 17:10:25 GMT):
Has joined the channel.

diegos (Mon, 06 Aug 2018 14:04:09 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=8noLswr5H3mP5mmAF) @pschwarz Hi, sorry I create a confusion, there are 2 different issues, (1) the first one is that when a transaction fails doing a second leg with an InternalError or an unknown error after doing the first leg (like debiting a FROM wallet), when the transaction is processed again, the context is not fresh, it already has being debited on the first try, so if now the transaction is successful (both first and second legs), the FROM wallet is debited twice. And (2) the intkey-tp-go is not catching the CBOR errors correctly.

RealDeanZhao (Tue, 07 Aug 2018 09:33:57 GMT):
I tried to set up the env using poet simulator.. but the poet-key-state-* files are created by the user root``` -rw-r--r-- 1 sawtooth sawtooth 1099511627776 Aug 7 09:27 block-00.lmdb -rw-r--r-- 1 sawtooth sawtooth 8192 Aug 7 09:27 block-00.lmdb-lock -rw-r--r-- 1 sawtooth sawtooth 128 Aug 7 09:27 block-chain-id -rw-r--r-- 1 sawtooth sawtooth 1099511627776 Aug 7 09:27 merkle-00.lmdb -rw-r--r-- 1 sawtooth sawtooth 8192 Aug 7 09:27 merkle-00.lmdb-lock -rw-r--r-- 1 root root 1099511627776 Aug 7 09:27 poet-key-state-0333172a.lmdb -rw-r--r-- 1 root root 8192 Aug 7 09:27 poet-key-state-0333172a.lmdb-lock ```

RealDeanZhao (Tue, 07 Aug 2018 09:33:57 GMT):
I tried to set up the env using poet simulator.. but the poet-key-state-* files are created by the user root which make the validator has no permission to the file``` -rw-r--r-- 1 sawtooth sawtooth 1099511627776 Aug 7 09:27 block-00.lmdb -rw-r--r-- 1 sawtooth sawtooth 8192 Aug 7 09:27 block-00.lmdb-lock -rw-r--r-- 1 sawtooth sawtooth 128 Aug 7 09:27 block-chain-id -rw-r--r-- 1 sawtooth sawtooth 1099511627776 Aug 7 09:27 merkle-00.lmdb -rw-r--r-- 1 sawtooth sawtooth 8192 Aug 7 09:27 merkle-00.lmdb-lock -rw-r--r-- 1 root root 1099511627776 Aug 7 09:27 poet-key-state-0333172a.lmdb -rw-r--r-- 1 root root 8192 Aug 7 09:27 poet-key-state-0333172a.lmdb-lock ```

Jirateep (Sat, 11 Aug 2018 12:52:05 GMT):
Has joined the channel.

Johannes2511 (Wed, 15 Aug 2018 07:21:06 GMT):
Has joined the channel.

henrytill (Thu, 16 Aug 2018 17:36:20 GMT):
Has joined the channel.

amundson (Thu, 16 Aug 2018 23:52:04 GMT):
we will likely start a release discussion on #sawtooth-release soon

TomBarnes (Thu, 16 Aug 2018 23:52:30 GMT):
great!

Gabe (Fri, 17 Aug 2018 20:26:00 GMT):
Has joined the channel.

alchmeina (Thu, 23 Aug 2018 14:16:36 GMT):
Has joined the channel.

TomBarnes (Fri, 24 Aug 2018 18:24:13 GMT):
I would like to update the list of run-time dependencies for Sawtooth.

TomBarnes (Fri, 24 Aug 2018 18:25:38 GMT):
Can I get some help in confiting tha the fllowing list contains all of the current Rust run-time dependencies for Sawtooth-core, sawtooth-poet, and sawtooth-raft?

TomBarnes (Fri, 24 Aug 2018 18:28:49 GMT):
` module version role path ------ ------- ---- ---- clap >=2.29.0 dependencies sawtooth-core\adm\Cargo.toml libc >=0.2.35 dependencies sawtooth-core\adm\Cargo.toml lmdb-zero >=0.4.1 dependencies sawtooth-core\adm\Cargo.toml protobuf 2.0 dependencies sawtooth-core\adm\Cargo.toml sawtooth_sdk none dependencies sawtooth-core\adm\Cargo.toml serde 1.0 dependencies sawtooth-core\adm\Cargo.toml serde_derive 1.0 dependencies sawtooth-core\adm\Cargo.toml serde_yaml 0.7 dependencies sawtooth-core\adm\Cargo.toml ctrlc 3.0 dependencies sawtooth-core\sdk\rust\Cargo.toml hex 0.3 dependencies sawtooth-core\sdk\rust\Cargo.toml log 0.3 dependencies sawtooth-core\sdk\rust\Cargo.toml rand 0.4.2 dependencies sawtooth-core\sdk\rust\Cargo.toml rust-crypto 0.2.36 dependencies sawtooth-core\sdk\rust\Cargo.toml secp256k1 0.7.1 dependencies sawtooth-core\sdk\rust\Cargo.toml uuid 0.5 dependencies sawtooth-core\sdk\rust\Cargo.toml zmq none dependencies sawtooth-core\sdk\rust\Cargo.toml log4rs 0.8 dependencies sawtooth-raft\Cargo.toml log4rs-syslog 3.0 dependencies sawtooth-raft\Cargo.toml raft 0.3.0 dependencies sawtooth-raft\Cargo.toml serde_json 1 dependencies sawtooth-raft\Cargo.toml `

TomBarnes (Fri, 24 Aug 2018 18:30:16 GMT):
maybe this is better

TomBarnes (Fri, 24 Aug 2018 18:30:26 GMT):

Clipboard - August 24, 2018 11:30 AM

TomBarnes (Fri, 24 Aug 2018 18:31:19 GMT):
Can I get some help in confirming that the following list contains all of the current Python run-time dependencies for Sawtooth-core, sawtooth-poet, and sawtooth-raft?

TomBarnes (Fri, 24 Aug 2018 18:32:03 GMT):

Clipboard - August 24, 2018 11:31 AM

TomBarnes (Sat, 25 Aug 2018 01:12:32 GMT):
Are we still using lmdb (py-lmdb) now that we've moved to Rust? Are there other Python packages still called out in setup.py files that are no longer being used?

TomBarnes (Sat, 25 Aug 2018 01:13:10 GMT):
thew py-lmdb project no longer has a maintainer, so its probably not a good component to be using in the long term.

wchang (Fri, 31 Aug 2018 00:11:06 GMT):
Has joined the channel.

ZorbaGrue (Fri, 31 Aug 2018 13:45:08 GMT):
Has joined the channel.

benoit.razet (Fri, 31 Aug 2018 15:13:09 GMT):
@jsmitchell @Dan @pschwarz I've added a comment to https://jira.hyperledger.org/browse/STL-1374 about a backpressure issue

benoit.razet (Fri, 31 Aug 2018 15:16:52 GMT):
I tried to follow the path in the sawtooth-core code to see if I could spot anything in the `ClientBatchSubmitBackpressureHandler` code and its ramification, but could not spot anything :( The ramifications go pretty far with all the components that are on the`client_thread_pool`

Dan (Fri, 31 Aug 2018 15:18:34 GMT):
thanks

benoit.razet (Fri, 31 Aug 2018 15:18:34 GMT):
Unfortunately I don't observe the bug on all the networks, only on one. I hope it's not a configuration thing.

amolk (Fri, 31 Aug 2018 15:30:06 GMT):
Hi @benoit.razet I assume you're not doing the TP stop/start step mentioned in the issue. Also, I'm wondering if the issue is resolved in the master branch. We used to observe significant backpressure a couple of months ago but things seem pretty stable now.

benoit.razet (Fri, 31 Aug 2018 15:58:18 GMT):
@amolk that's right, no TP stop/start step. Independently, I've been preparing for moving to 1.1 (master). Do you suggest I should do that sooner rather than later?

sureshtedla (Fri, 31 Aug 2018 17:31:53 GMT):
Has joined the channel.

pschwarz (Fri, 31 Aug 2018 18:25:57 GMT):
The way that I'm reading that bug, I think the submitter is running into an issue that has more to do with how that particular version of the workload generator is written. It doesn't know how to deal with back pressure. So it doesn't know to _not_ submit increment and decrement txns on keys that were not set (due to back-pressure), so it will submit a bunch of invalid transactions

benoit.razet (Fri, 31 Aug 2018 19:11:48 GMT):
ah! I'll continue working on narrowing down the pb

ZorbaGrue (Sat, 01 Sep 2018 09:22:28 GMT):
[Java-SDK]Hi. I keep struggling to find the appropriate *sawtooth.sdk.protobuf.Message.MessageType* to use when trying to filter results by a specific field. There are a lot of values defined and for the ones i tried using i keep getting same exception when setting a field on the request: `java.lang.IllegalArgumentException: FieldDescriptor does not match message type.` Do you have any ideas\suggestions regarding this? Thanks.

rjones (Sun, 02 Sep 2018 13:53:47 GMT):
@ZorbaGrue you should ask in #sawtooth

deb (Tue, 04 Sep 2018 09:25:18 GMT):
When I am running validator for Off-Chain permissioning then I am getting following error : ========================================================================

deb (Tue, 04 Sep 2018 09:25:20 GMT):
ubuntu@ip-172-31-27-120:~$ sudo -u sawtooth sawtooth-validator -vv [2018-09-04 09:24:58.661 INFO path] Skipping path loading from non-existent config file: /etc/sawtooth/path.toml [2018-09-04 09:24:58.662 INFO validator] Loading validator information from config: /etc/sawtooth/validator.toml Traceback (most recent call last): File "/usr/bin/sawtooth-validator", line 9, in load_entry_point('sawtooth-validator==1.0.5', 'console_scripts', 'sawtooth-validator')() File "/usr/lib/python3/dist-packages/sawtooth_validator/server/cli.py", line 238, in main load_validator_config(opts_config, path_config.config_dir) File "/usr/lib/python3/dist-packages/sawtooth_validator/server/cli.py", line 179, in load_validator_config toml_config = load_toml_validator_config(conf_file) File "/usr/lib/python3/dist-packages/sawtooth_validator/config/validator.py", line 60, in load_toml_validator_config toml_config = toml.loads(raw_config) File "/usr/lib/python3/dist-packages/toml.py", line 331, in loads value, vtype = load_value(pair[1]) File "/usr/lib/python3/dist-packages/toml.py", line 451, in load_value return (load_array(v), "array") File "/usr/lib/python3/dist-packages/toml.py", line 513, in load_array nval, ntype = load_value(a[i]) File "/usr/lib/python3/dist-packages/toml.py", line 418, in load_value raise Exception("Stuff after closed string. WTF?") Exception: Stuff after closed string. WTF?

jsmitchell (Tue, 04 Sep 2018 14:58:03 GMT):
@deb sounds like you have an invalid validator.toml file

jsmitchell (Tue, 04 Sep 2018 15:05:04 GMT):
If you are using master and haven’t edited the toml file, there was just a PR merged which corrected a typo in the example file

benoit.razet (Tue, 04 Sep 2018 15:19:39 GMT):
@jsmitchell @pschwarz @Dan I added a comment to https://jira.hyperledger.org/browse/STL-1374 on my side, it was actually a red herring, nothing wrong with the back pressure :thumbsup: . The problem was that I forgot to add the `sawtooth_validator_registry` to the on-chain setting of transaction families. The validators were able to process batches for some number of blocks until they were locked out because they could not register there pubkeys as part of a `sawtooth_validator_registry` that was filtered.

benoit.razet (Tue, 04 Sep 2018 15:19:39 GMT):
@jsmitchell @pschwarz @Dan I added a comment to https://jira.hyperledger.org/browse/STL-1374 on my side, it was actually a red herring, nothing wrong with the back pressure :thumbsup: . The problem was that I forgot to add the `sawtooth_validator_registry` to the on-chain setting of transaction families. The validators were able to process batches for some number of blocks until they were locked out because they could not register their pubkeys as part of a `sawtooth_validator_registry` txn that was filtered.

benoit.razet (Tue, 04 Sep 2018 15:19:39 GMT):
@jsmitchell @pschwarz @Dan I added a comment to https://jira.hyperledger.org/browse/STL-1374 on my side, it was actually a red herring, nothing wrong with the back pressure :thumbsup: . The problem was that I forgot to add the `sawtooth_validator_registry` to the on-chain setting of transaction families. The validators were able to process batches for some number of blocks until they were locked out because they could not register their pubkeys as part of a `sawtooth_validator_registry` that was filtered.

FrankCastellucci (Tue, 04 Sep 2018 20:25:15 GMT):
@benoit.razet (et. al.) is that just a problem when using PoET or is it independent of consensus (neglecting to add `sawtooth.validator.transaction_families`)? TIA

benoit.razet (Tue, 04 Sep 2018 20:30:39 GMT):
good question @FrankCastellucci I always assumed the `sawtooth_validator_registry` was for PoET, but maybe it's for other consensus engines too. I'm curious to know the answer now.

FrankCastellucci (Tue, 04 Sep 2018 20:32:54 GMT):
btw: On last weeks Tech meeting @Dan and I were discussing zksnarks, commitments, nullifiers and zk-rangeproofs... a WIP

Dan (Tue, 04 Sep 2018 20:36:05 GMT):
sawtooth_validator_registry is just used by poet.

amolk (Wed, 05 Sep 2018 05:46:10 GMT):
BTW, we've been seeing significant backpressure the last few nightly builds (1455-1 onwards)

FrankCastellucci (Wed, 05 Sep 2018 20:17:25 GMT):
Is there plans to up the python builds for 3.5.3+? Most specific is signing dependencies on secp256k1 which breaks on install with later python versions?

Dan (Thu, 06 Sep 2018 01:15:59 GMT):
Nope no plans (at least that I have) to upgrade python. Longer term I'm interested in replacing secp, but that's pretty long term. One option will be the crypto-lib lab but we'll need to see how that comes together.

Naman_13 (Thu, 06 Sep 2018 10:41:42 GMT):
Has joined the channel.

Naman_13 (Thu, 06 Sep 2018 10:41:55 GMT):
I had a question, with respect to Event subscription through this developer guide - https://sawtooth.hyperledger.org/docs/core/nightly/master/app_developers_guide/zmq_event_subscription.html Can somebody tell me where exactly I'm supposed to write the subscription? Should I write it in my web-server python file or should I write it in my transaction processor file? Could somebody give me a code example I could see and probably implement? Thanks!

Dan (Thu, 06 Sep 2018 12:10:36 GMT):
Hi @Naman_13 can you try posting this on #sawtooth? We use this channel for discussing internal development of sawtooth.

benoit.razet (Thu, 06 Sep 2018 16:42:06 GMT):
Recently, we've started to stress the validators more than we previously did, by submitting batches at high pace, and possibly resubmitting identical batches. That must stress the validators in a way we did not in the past, and they start to fail with the following message: ``` [2018-09-06 16:24:39.253 ERROR future] An unhandled error occurred while running future callback Traceback (most recent call last): File "/usr/lib/python3/dist-packages/sawtooth_validator/networking/future.py", line 79, in run_callback self._callback_func(self._request, self._result) File "/usr/lib/python3/dist-packages/sawtooth_validator/execution/executor.py", line 154, in _future_done_callback error_data=response.extended_data) File "/usr/lib/python3/dist-packages/sawtooth_validator/execution/scheduler_parallel.py", line 659, in set_transaction_execution_result txn_signature) File "/usr/lib/python3/dist-packages/sawtooth_validator/execution/scheduler_parallel.py", line 580, in _remove_subsequent_result_because_of_batch_failure if self._is_txn_to_replay(txn_id, poss_successor, seen): File "/usr/lib/python3/dist-packages/sawtooth_validator/execution/scheduler_parallel.py", line 559, in _is_txn_to_replay possible_successor) File "/usr/lib/python3/dist-packages/sawtooth_validator/execution/scheduler_parallel.py", line 540, in _is_in_same_batch self._batches_by_txn_id[txn_id_2] KeyError: '746807039356e72cf30c653b925e8b955d1664aff1303db68f6069c15c9504a84b1dd4fe3024c6a7ed103316ae1dc922629fb008b0146e3c1346677e4c1bd95b' ```

benoit.razet (Thu, 06 Sep 2018 16:42:23 GMT):
and that's after more than 40000 blocks in

benoit.razet (Thu, 06 Sep 2018 16:42:43 GMT):
any idea?

jsmitchell (Thu, 06 Sep 2018 16:43:26 GMT):
@boydjohnson ^

LeonardoCarvalho (Fri, 07 Sep 2018 13:09:49 GMT):
Has joined the channel.

LeonardoCarvalho (Fri, 07 Sep 2018 13:11:41 GMT):
Hello all, I'm here as well

LeonardoCarvalho (Fri, 07 Sep 2018 13:11:42 GMT):
:)

LeonardoCarvalho (Fri, 07 Sep 2018 13:11:55 GMT):
So, a little question

LeonardoCarvalho (Fri, 07 Sep 2018 13:11:57 GMT):
I'm using an DEALER - ROUTER - ROUTER 0mq topology and I need to know in advance, or at connection time the Socket ID of the 0mq socket on the remote validator Looks like this is a known pattern: https://github.com/zeromq/pyzmq/issues/974

LeonardoCarvalho (Fri, 07 Sep 2018 13:12:14 GMT):
I've changed in my core code with this lines:

LeonardoCarvalho (Fri, 07 Sep 2018 13:12:50 GMT):

Clipboard - September 7, 2018 10:12 AM

LeonardoCarvalho (Fri, 07 Sep 2018 13:13:07 GMT):
in sawtooth_validator/networking/interconnect.py

LeonardoCarvalho (Fri, 07 Sep 2018 13:13:29 GMT):
@ line 342

LeonardoCarvalho (Fri, 07 Sep 2018 13:13:51 GMT):
should I create a ticket and submit a change? Would that be useful?

danintel (Fri, 07 Sep 2018 17:01:03 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=MMSuwSwea53QzCoDt) @LeonardoCarvalho Yes, please create a JIRA ticket. A PR would be good too.

amundson (Fri, 07 Sep 2018 21:08:22 GMT):
@LeonardoCarvalho is that topology the same or different than what the other components are using?

LeonardoCarvalho (Fri, 07 Sep 2018 21:15:06 GMT):
I'm not aware ...

LeonardoCarvalho (Fri, 07 Sep 2018 21:18:27 GMT):
But I can say that, if there's no empty 0mq messages being used, and I didn't found one, the impact is zilch

amundson (Sat, 08 Sep 2018 02:46:13 GMT):
@LeonardoCarvalho I'd like to understand what is different between what we do now and what you are proposing. We have quite a few different SDK implementations, so I suspect a difference in your implementation that might be unnecessary. The impact of change here is not zilch, since the behavior of the validator is part of the stable API surface, so if this enables a new pattern, we want to understand if that pattern is a desirable thing to support (for potentially a long time). Then we have to convince ourselves that there are no backward-compatible side-effects (probably easy, but at least one reviewer would have to dig deep).

LeonardoCarvalho (Sat, 08 Sep 2018 12:03:02 GMT):
Sure, I'll try to summarize here, I can create a ticket with more info on JIRA, to further discussion

LeonardoCarvalho (Sat, 08 Sep 2018 12:04:55 GMT):
Basically, during the creation of a Reactive patter on Java SDK, I noticed that a ROUTER socket, on the TP, could replicate the Majordomo Protocol from ZeroMQ

LeonardoCarvalho (Sat, 08 Sep 2018 12:04:59 GMT):
https://rfc.zeromq.org/spec:7/MDP/

LeonardoCarvalho (Sat, 08 Sep 2018 12:05:38 GMT):
to deliver the messages to backend DEALER sockets, using IPC instead of TCP addresses

LeonardoCarvalho (Sat, 08 Sep 2018 12:06:11 GMT):
the idea behind this is accept more than one TP on the same machine or even JVM

LeonardoCarvalho (Sat, 08 Sep 2018 12:08:03 GMT):
to keep agnostic on naming conventions or configurations, the idea of using the ZMQ_PROBE_ROUTER flag looked like a very good fit to handle a faster handshake

LeonardoCarvalho (Sat, 08 Sep 2018 12:08:08 GMT):
http://api.zeromq.org/4-1:zmq-setsockopt

LeonardoCarvalho (Sat, 08 Sep 2018 12:09:43 GMT):
I've searched the server code and didn't manage to find a zero sized message on it, and changed the lines of code to implement the probing message, with success

LeonardoCarvalho (Sat, 08 Sep 2018 12:11:58 GMT):
Since the implementation on clients talking to ROUTER sockets wouldn't be affected, since it's only triggered with the flag, I thought that the impact would be negligible

cfzhang (Sun, 09 Sep 2018 19:37:37 GMT):
Has joined the channel.

arsulegai (Tue, 11 Sep 2018 18:55:46 GMT):
Has joined the channel.

adamgering (Fri, 14 Sep 2018 19:28:56 GMT):
Has joined the channel.

kthblmfld (Mon, 17 Sep 2018 16:13:34 GMT):
Has joined the channel.

adamludvik (Tue, 18 Sep 2018 15:22:12 GMT):
@LeonardoCarvalho Hi. I am somewhat familiar with the Majordomo pattern and ROUTER-ROUTER in general. Currently we are using the ROUTER-DEALER request-reply pattern, which has been stabilized as part of the Transaction Processor API. If I understand, you are suggesting that we need ROUTER-ROUTER to support multiple TPs with a single validator. We have already added support for multiple TPs with a single validator at the application level, so using ROUTER-ROUTER for the TP API seems unnecessary.

LeonardoCarvalho (Tue, 18 Sep 2018 16:08:58 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=MkxKiv33zZti9KEEs) @adamludvik No no, we don't *need*, but would be nice to give the capability to the TPs

LeonardoCarvalho (Tue, 18 Sep 2018 16:09:17 GMT):
in my case, I got a very good boost of I/O using this pattern

adamludvik (Tue, 18 Sep 2018 16:16:05 GMT):
"got a very good boost of I/O" <- okay you have my attention :)

adamludvik (Tue, 18 Sep 2018 16:17:21 GMT):
Can you say a little more about where the backend DEALER sockets and what is using them? Are you suggesting we insert something between the validator and TPs to do routing?

jsmitchell (Tue, 18 Sep 2018 16:27:01 GMT):
@LeonardoCarvalho what sort of IO boost? What do you attribute that improvement to?

LeonardoCarvalho (Tue, 18 Sep 2018 22:18:41 GMT):
I got, without too much changes in the code, a 4X improvement. The fact is, with that topology, the receiving socket got much less work to do, it only propagates to the backend. The backend receives the answers or requests and transmits to the front end

LeonardoCarvalho (Tue, 18 Sep 2018 22:19:32 GMT):
In my feeling, the I/O got this boost merely reusing the threads in a more efficient way. See, the receiving and transmitting operations got totally decoupled.

LeonardoCarvalho (Tue, 18 Sep 2018 22:24:40 GMT):
and, I think, the Reactive architecture must be related to this gain as well, I think the code got a better fit in it using the IPC backends

LeonardoCarvalho (Tue, 18 Sep 2018 23:54:22 GMT):
Using a sawtooth server with the code change I've suggested, there's some code to evaluate the gains on my fork of the java-sdk project, feel very welcome to give any feedback

amundson (Wed, 19 Sep 2018 00:03:13 GMT):
I wouldn't have guessed the Java SDK was slow enough to be the bottleneck

LeonardoCarvalho (Wed, 19 Sep 2018 10:06:39 GMT):
It was, it could only work *one message a time* in the original version

amundson (Wed, 19 Sep 2018 16:12:10 GMT):
we don't see the 0MQ being a bottleneck in the other SDKs, though they do not consistently parallelize the requests currently. Go is probably the most correct in that respect.

wyatt-noise (Wed, 19 Sep 2018 23:58:04 GMT):
Has joined the channel.

manju.ac (Thu, 20 Sep 2018 08:54:36 GMT):
Has joined the channel.

LeonardoCarvalho (Thu, 20 Sep 2018 09:40:10 GMT):
Ah, well, I always think in high loads, and there was a ticket on JIRA about multi threading the java SDK, so...

kelly_ (Thu, 20 Sep 2018 15:04:04 GMT):
Hey All - I wanted to get some thoughts on how to coordinate Sawtooth feature planning moving forward. I had a conversation with @amundson a couple weeks ago who said that he was looking at point-to-point (Quorom style) private transactions. I know some other folks have also been looking at that as a feature

kelly_ (Thu, 20 Sep 2018 15:05:04 GMT):
Not sure if we should have some sort of living document where people detail what they are working on for the next couple months so we can look for overlap

kelly_ (Thu, 20 Sep 2018 15:05:33 GMT):
e.g. @LeonardoCarvalho could call out the JavaSDK to ensure that no one else was working on something in parallel, and if so they could connect to collaborate on it

kelly_ (Thu, 20 Sep 2018 15:05:46 GMT):
other options would be like a weekly call, or a combined community standup, etc.

kelly_ (Thu, 20 Sep 2018 15:06:32 GMT):
just looking for any feedback for what would be easiest/most preferable for the community members

amundson (Thu, 20 Sep 2018 15:10:27 GMT):
@kelly_ that's not how I would summarize the feature exactly, but it's close enough that if there are others doing similar work they should talk to me. we will submit an RFC when we know exactly what we are proposing.

kelly_ (Thu, 20 Sep 2018 15:11:05 GMT):
ok, i'm less concerned about that specific feature than just how to keep the community aligned and make sure that too much thinking doesnt happen in isolation

amundson (Thu, 20 Sep 2018 15:13:23 GMT):
I'm not sure how to solve it, but the difficulty will almost certainly be that some of us don't like to advertise features until we can commit to them for sure. We are probably at an extreme right now in terms of being conservative about it.

amundson (Thu, 20 Sep 2018 15:14:00 GMT):
(part of the reason we have RFC process is to help with that though)

kelly_ (Thu, 20 Sep 2018 15:14:38 GMT):
Yea I don't think it's about committing to the feature, but more just raising awareness among the community participants on who is working on or investigating what

kelly_ (Thu, 20 Sep 2018 15:16:22 GMT):
as always this would be optional, so just looking for feedback on what works or doesnt work for people

kelly_ (Thu, 20 Sep 2018 15:16:55 GMT):
I think a living document is probably the easiest cause it's asynchronous

kelly_ (Thu, 20 Sep 2018 15:17:29 GMT):
if people want to put what they are working on great, if they don't feel comfortable, that's also their choice, but at least they would get the opportunity to see if someone else had called it out

adamludvik (Thu, 20 Sep 2018 15:25:56 GMT):
@kelly_ I recall from a conversation with the Indy team that they just solved this problem after experiencing a couple instances where two teams developed basically the same thing in isolation. It might be good to reach out to them and see if they have a solution that is working well.

kelly_ (Thu, 20 Sep 2018 15:35:08 GMT):
@adamludvik thanks that is a great suggestion

Dan (Thu, 20 Sep 2018 15:36:25 GMT):
Some of us will be in Montreal. That's more of a one-off than a pattern for how we stay clear. I would like to take advantage of that time though.

kelly_ (Thu, 20 Sep 2018 15:38:53 GMT):
yep, I'm booking my flight out there today

kelly_ (Thu, 20 Sep 2018 16:18:19 GMT):
"@kelly_ We use a shared roadmap in the wiki where each team declares what they plan to work on: https://wiki.hyperledger.org/projects/indy/roadmap We should probably do better at keeping that up to date. We also have a monthly meeting were we discuss priorities together."

kelly_ (Thu, 20 Sep 2018 16:18:32 GMT):
^ @adamludvik that was what the Indy team said

kelly_ (Thu, 20 Sep 2018 16:18:35 GMT):
that seems pretty reasonable

kelly_ (Thu, 20 Sep 2018 16:19:03 GMT):
we could maybe even allocate 1 of the tech forums or app dev calls per month to that open discussion

adamludvik (Thu, 20 Sep 2018 16:30:33 GMT):
I think I am good with updating a public roadmap doc once a month. I'm not sure about using a wiki though, I've heard they tend to fall out of date.

Dan (Thu, 20 Sep 2018 16:34:45 GMT):
I can't imagine where you heard that.

adamludvik (Thu, 20 Sep 2018 16:47:26 GMT):
I think it was in a wiki somewhere

Dan (Thu, 20 Sep 2018 17:29:21 GMT):
I had a google docs link for it but it got buried in the others.

kelly_ (Thu, 20 Sep 2018 17:32:57 GMT):
I would go for google docs too as well

boydjohnson (Thu, 20 Sep 2018 18:04:53 GMT):

test_duplicates_dependencies_20_iter_branch.png

boydjohnson (Thu, 20 Sep 2018 18:05:05 GMT):
@jsmitchell @adamludvik ^

boydjohnson (Thu, 20 Sep 2018 18:05:19 GMT):
It is all signing and test framework.

Dan (Thu, 20 Sep 2018 18:12:04 GMT):
duplicates and dependencies?

adamludvik (Thu, 20 Sep 2018 18:16:24 GMT):
@boydjohnson heh

adamludvik (Thu, 20 Sep 2018 18:16:53 GMT):
@Dan we are trying to move more of our technical discussions to this channel, so that is carried over from another channel

adamludvik (Thu, 20 Sep 2018 18:17:16 GMT):
So the fix seems to make no difference on performance?

jsmitchell (Thu, 20 Sep 2018 18:18:33 GMT):
SWEET

boydjohnson (Thu, 20 Sep 2018 18:18:46 GMT):
Well that is the semi-optimized code. I can do another test with the naive-algorithm and beofre.

jsmitchell (Thu, 20 Sep 2018 18:19:01 GMT):
nice work on getting that callgrind/viz stuff together @boydjohnson and @adamludvik

adamludvik (Thu, 20 Sep 2018 18:22:14 GMT):
@boydjohnson definitely did all of the work on that

jsmitchell (Thu, 20 Sep 2018 18:23:19 GMT):
:clap:

boydjohnson (Thu, 20 Sep 2018 18:23:28 GMT):
Yay!

jsmitchell (Thu, 20 Sep 2018 18:23:46 GMT):
@boydjohnson is this cpu time or wall clock time?

boydjohnson (Thu, 20 Sep 2018 18:24:42 GMT):
I am a little unsure. Is there a flag I would have to pass to get one or the other? I did `valgrind --tool=callgrind` on the binary.

boydjohnson (Thu, 20 Sep 2018 18:26:55 GMT):
I think it is cpu time.

jsmitchell (Thu, 20 Sep 2018 18:30:15 GMT):
@boydjohnson are all these being signed by the same key?

jsmitchell (Thu, 20 Sep 2018 18:30:25 GMT):
you should definitely be caching the signing context

boydjohnson (Thu, 20 Sep 2018 18:31:08 GMT):
Yes, I used the smallbank_workload iterator and passed it the same Signer.

jsmitchell (Thu, 20 Sep 2018 18:31:12 GMT):
35% of the cpu time is being spent generating the public key during the signing operation

boydjohnson (Thu, 20 Sep 2018 18:32:52 GMT):
Is that a misuse of the signing library in Rust SDK, or an inefficiency within it?

jsmitchell (Thu, 20 Sep 2018 18:33:22 GMT):
probably a miss -- we had the same problem in the python lib and @Dan had added some caching

adamludvik (Thu, 20 Sep 2018 18:34:46 GMT):
@boydjohnson is this conclusive evidence that those spikes in execution time you were seeing earlier are a result of wating for the GIL?

adamludvik (Thu, 20 Sep 2018 18:34:46 GMT):
@boydjohnson is this conclusive evidence that those spikes in execution time you were seeing earlier are a result of waiting for the GIL?

boydjohnson (Thu, 20 Sep 2018 18:35:50 GMT):
Maybe not conclusive yet. Those graphs were run on the pre-naive algorithm code, before this month. I am going to run this same callgrind on that code.

jsmitchell (Thu, 20 Sep 2018 18:36:27 GMT):
i can't find info about wall clock time with callgrind, which is a bummer. Any waiting on locks is presumably going to be masked unless we can get wall clock time.

boydjohnson (Thu, 20 Sep 2018 18:39:32 GMT):
A stackoverflow post I read said a statistical sampling profiler like oprofile would take into account the idling time around locks.

jsmitchell (Thu, 20 Sep 2018 18:41:58 GMT):
that seems likely based on this profile data, actually

jsmitchell (Thu, 20 Sep 2018 18:42:21 GMT):
we know from the smallbank workload generator that generating and signing batches is very fast

jsmitchell (Thu, 20 Sep 2018 18:42:41 GMT):
looks like around 500k executions of that here

jsmitchell (Thu, 20 Sep 2018 18:43:24 GMT):
so, while it's a high percentage of cpu time, it's probably not a big percentage of overall cpu time

jsmitchell (Thu, 20 Sep 2018 18:43:24 GMT):
so, while it's a high percentage of cpu time, it's probably not a big percentage of overall clock time

boydjohnson (Thu, 20 Sep 2018 18:44:57 GMT):
So it got called a ton of times is what you are saying.

jsmitchell (Thu, 20 Sep 2018 18:45:42 GMT):
actually only ~160k executions

jsmitchell (Thu, 20 Sep 2018 18:46:42 GMT):
well, no. what I'm saying is that if get_block_from_main_cache... for example is actually a much larger percentage of overall time based on waiting on locks, then that won't show up in these numbers

jsmitchell (Thu, 20 Sep 2018 18:46:58 GMT):
because it's measuring time on the cpu

jsmitchell (Thu, 20 Sep 2018 18:47:18 GMT):
so, because the signing stuff is cpu intensive, it will be overweighted on this picture

amundson (Thu, 20 Sep 2018 18:50:06 GMT):
how is this both validator and sawtooth_perf?

amundson (Thu, 20 Sep 2018 18:51:38 GMT):
this is from a unit test or something?

amundson (Thu, 20 Sep 2018 18:52:02 GMT):
@boydjohnson ^

boydjohnson (Thu, 20 Sep 2018 18:53:28 GMT):
Yes, I have a branch that brings in smallbank_workload (to create transactions), sawtooth_perf (to create batches), and a benchmark test that calls DuplicatesAndDependenciesValidation.validate_block a number of times.

amundson (Thu, 20 Sep 2018 18:55:12 GMT):
@boydjohnson so, that signing is client-side and really doesn't say anything about the validator code, right? or am I reading it incorrectly?

boydjohnson (Thu, 20 Sep 2018 18:55:32 GMT):
That is correct, it is sdk code.

amundson (Thu, 20 Sep 2018 18:55:46 GMT):
well, specifically, it has nothing to do with validator performance

amundson (Thu, 20 Sep 2018 18:56:14 GMT):
quite the reverse, I think this suggests the validator code being tested is working quite well

boydjohnson (Thu, 20 Sep 2018 18:56:28 GMT):
Except this is only time on cpu.

amundson (Thu, 20 Sep 2018 18:57:46 GMT):
right, from a CPU perspective

amundson (Thu, 20 Sep 2018 18:58:14 GMT):
the higher percent that client signing code has, the more efficient the validator code must be

adamludvik (Thu, 20 Sep 2018 18:59:04 GMT):
For reference/context, the goal of this testing is to determine how the performance of that change checked when we removed the concurrency problem and whether we need to do additional optimization on that check.

adamludvik (Thu, 20 Sep 2018 18:59:57 GMT):
So we still need the "after we implemented the fix" graphs to answer that question.

adamludvik (Thu, 20 Sep 2018 19:00:04 GMT):
@boydjohnson did I get that right?

boydjohnson (Thu, 20 Sep 2018 19:02:50 GMT):
@adamludvik These graphs are after the fix. We need the pre-fix at least, and maybe the naive solution, too.

boydjohnson (Thu, 20 Sep 2018 19:03:04 GMT):
I am working on getting graphs pre-fix.

TomBarnes (Thu, 20 Sep 2018 19:05:15 GMT):
regarding feature roadmap, here is one example: https://wiki.postgresql.org/wiki/PostgreSQL11_Roadmap

TomBarnes (Thu, 20 Sep 2018 19:08:18 GMT):
and... https://rust.facepunch.com/roadmap/

TomBarnes (Thu, 20 Sep 2018 19:09:48 GMT):
looks like rust manages their roadmap through their RFC process - https://github.com/rust-lang/rfcs/blob/master/text/2314-roadmap-2018.md

kelly_ (Thu, 20 Sep 2018 19:10:51 GMT):
great links, thanks Tom

boydjohnson (Thu, 20 Sep 2018 19:59:19 GMT):
http://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.html This describes off-cpu flamegraphs.

boydjohnson (Thu, 20 Sep 2018 20:52:03 GMT):

test_duplicates_dependencies_prefix.png

boydjohnson (Thu, 20 Sep 2018 20:52:25 GMT):
@adamludvik @jsmitchell @amundson ^

adamludvik (Thu, 20 Sep 2018 20:54:14 GMT):
Just to clarify, the first graph was after and this is before?

boydjohnson (Thu, 20 Sep 2018 20:55:29 GMT):
Yes.

jsmitchell (Thu, 20 Sep 2018 21:19:03 GMT):
pretty significant difference

boydjohnson (Thu, 20 Sep 2018 21:22:26 GMT):
Yeah, it seems like we did good.

anandakumar.n (Fri, 21 Sep 2018 13:39:58 GMT):
Has joined the channel.

anandakumar.n (Fri, 21 Sep 2018 13:40:18 GMT):
Hi! i just want to know the reason ! why block header doesn't have time value!??

Dan (Fri, 21 Sep 2018 18:19:31 GMT):
There's some dialog on that over in #sawtooth if you scroll up a bit.

adamludvik (Fri, 21 Sep 2018 20:29:10 GMT):
I have updated the Consensus Engine API RFC to reflect changes made during implementation. Going to work on documentation next: https://github.com/hyperledger/sawtooth-rfcs/pull/4/commits/7355af743cd0c15674318a7fbd0e8b9112c440a4 Would like to get a final round of feedback and then enter FCP.

acdam.bacdam (Sun, 23 Sep 2018 12:07:10 GMT):
Has joined the channel.

acdam.bacdam (Sun, 23 Sep 2018 12:12:08 GMT):
I am new to sawtooth, read documentation and created application for simplewallet to deposit, withdraw and check balance for any account holder / User. I want an action to be implemented to fetch details of all the transaction happened for that user, or in other words when that user has deposited (including amount) and when withdrawl - something like a passbook to list all transaction details. So far I cam across context.setstate and getstate methods to fetch or set current state. Can anyone help here to fetch list of all transactions happened for provided user ?

danintel (Mon, 24 Sep 2018 16:45:04 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=rMmNenbb4XTSnC7og) @anandakumar 1 Using timestamps in a distributed network is troublesome--mostly due to complex clock synchronization issues among peers. You could add a timestamp in your transaction family's transaction payload. If you want timestamps with blocks, refer to the BlockInfo Transaction Family. See: https://sawtooth.hyperledger.org/docs/core/releases/latest/transaction_family_specifications/blockinfo_transaction_family.html

adamludvik (Mon, 24 Sep 2018 16:50:39 GMT):
@acdam.bacdam This is a good question for the #sawtooth channel.

adamludvik (Mon, 24 Sep 2018 16:51:31 GMT):
@danintel @anandakumar 1 I believe there was a conversation related to this in the #sawtooth channel too.

VanC 7 (Tue, 25 Sep 2018 09:40:23 GMT):
Has joined the channel.

Dan (Tue, 25 Sep 2018 14:58:43 GMT):
@pschwarz I was looking to see how much we were using the python lmdb library. @TomBarnes mentioned that it is no longer maintained. https://github.com/dw/py-lmdb/blob/master/README.md#py-lmdb-needs-a-maintainer From my quick search it looks like the blockstore and receipt store still use the python library... ```validator/sawtooth_validator/server/core.py validator/sawtooth_validator/server/state_verifier.py``` But all the state stuff (i.e. merkle trie) uses rust. Did I get that right? {block store, receipt store} <-- Python {state} <-- Rust

Dan (Tue, 25 Sep 2018 15:13:50 GMT):
Slightly related, why do we have both of these.. they look very similar ```.//validator/src/database/lmdb.rs .//adm/src/database/lmdb.rs ```

adamludvik (Tue, 25 Sep 2018 16:02:57 GMT):
@Dan I believe you summarized that correctly. I am already working on pulling out python lmdb from the blockstore: https://github.com/aludvik/sawtooth-core/commits/use-rust-blockstore The receipt store is probably not too hard to do. Those are very similar because we copied the adm/ implementation to the validator/ package.

Dan (Tue, 25 Sep 2018 16:05:34 GMT):
Do you recall why we don't have adm just use the validator's database code? Is it a functional difference or just dependency limitation?

adamludvik (Tue, 25 Sep 2018 16:11:51 GMT):
I believe we wanted the packages to not depend on one another.

knkski (Tue, 25 Sep 2018 16:13:48 GMT):
Has joined the channel.

knkski (Tue, 25 Sep 2018 16:14:47 GMT):
Could I get another review on https://github.com/hyperledger/sawtooth-core/pull/1872?

adamludvik (Tue, 25 Sep 2018 16:19:02 GMT):
@agunde @amundson ^

amundson (Tue, 25 Sep 2018 16:39:54 GMT):
@knkski nice. I love the removal of that c code.

knkski (Tue, 25 Sep 2018 17:35:58 GMT):
:woo:

MicBowman (Wed, 26 Sep 2018 22:37:52 GMT):
the head field in the response to fetch_state is supposed to be the identity of the most recently committed block, right?

MicBowman (Wed, 26 Sep 2018 22:38:21 GMT):
i'm getting the same value for head no matter how many transactions are committed... using the developer validator

adamludvik (Thu, 27 Sep 2018 00:13:48 GMT):
@MicBowman can you give more context?

Dan (Thu, 27 Sep 2018 03:05:10 GMT):
Here's a video tutorial: https://www.youtube.com/watch?v=p85xwZ_OLX0

adamludvik (Thu, 27 Sep 2018 05:34:41 GMT):
@Dan you are my hero

thou_shalt (Thu, 27 Sep 2018 06:26:14 GMT):
Has joined the channel.

thou_shalt (Thu, 27 Sep 2018 06:26:37 GMT):
Hello, can somebody help with instructions how to write own consensus engine using pyhon sdk from sawtooth_sdk.consensus .Some simple example for instance .

Dan (Thu, 27 Sep 2018 12:11:11 GMT):
@thou_shalt good question for #sawtooth-consensus-dev in brief though, the only example is poet. poet is shimmed in from the approach that preceded engines so it's not a good example. The more recent examples are written in rust. This design doc may be of help to you: https://github.com/hyperledger/sawtooth-rfcs/pull/4/files

kelly_ (Thu, 27 Sep 2018 15:21:04 GMT):
what are people's thoughts on deprecating/stopping the publishing/posting of the 0.8 documentation?

kelly_ (Thu, 27 Sep 2018 15:22:08 GMT):
I've noticed that people reference it regularly in the #sawtooth channel, and also have noticed that it comes up on google searches sometimes which I think is adding confusion

rjones (Thu, 27 Sep 2018 15:22:52 GMT):
Add a header to it maybe? I wouldn't unpublish it.

rjones (Thu, 27 Sep 2018 15:23:31 GMT):
Someone out there is stuck on 0.8 for the next five years

kelly_ (Thu, 27 Sep 2018 15:35:57 GMT):
I'm afraid we're going to have more people stuck if we keep it up :)

kelly_ (Thu, 27 Sep 2018 15:36:05 GMT):
but that's a good thought

MicBowman (Thu, 27 Sep 2018 16:31:38 GMT):
@adamludvik just looking at the head field in the response; it keeps coming up the same thing. i think, however, that it is in the section of my tests that are designed to fail so there might not actually be any transactions being sent (well.. txns that should be committed). this is part of some exploration on how to get the validator to sign the results for a get-state, get-transaction, get-block operations (through the rest-api)

jsmitchell (Thu, 27 Sep 2018 18:58:43 GMT):
@MicBowman are blocks being published? If no blocks are being published, then I wouldn't expect head to change.

MicBowman (Thu, 27 Sep 2018 20:06:47 GMT):
@jsmitchell i think it was just the section of failed tests...

MicBowman (Thu, 27 Sep 2018 20:07:06 GMT):
i'm trying to find a way to get the validator to sign the result of the get-state

MicBowman (Thu, 27 Sep 2018 20:07:21 GMT):
so we can use it as an attestation of commit outside the ledger

jsmitchell (Thu, 27 Sep 2018 20:14:56 GMT):
could the client (the thing making the get call) sign it instead?

jsmitchell (Thu, 27 Sep 2018 20:34:37 GMT):
if not, i think you'd need to extend the protocol

MicBowman (Thu, 27 Sep 2018 20:57:44 GMT):
the client is the one no one trusts

MicBowman (Thu, 27 Sep 2018 20:57:59 GMT):
and, yes, it looks like there is no easy way to do this

MicBowman (Thu, 27 Sep 2018 20:58:10 GMT):
the keys to sign don't exist in the rest_api

MicBowman (Thu, 27 Sep 2018 20:58:22 GMT):
so it would have to be from the validator itself managing state

jsmitchell (Thu, 27 Sep 2018 21:02:49 GMT):
how are you deciding that the keys to sign are trustworthy?

jsmitchell (Thu, 27 Sep 2018 21:16:06 GMT):
meaning, if the rest api did have its own keys, and you had a mechanism for registering/trusting them, then that would be a solution which is on the 'client side' (we consider the rest api an example, and have written several different rest apis with different behaviors which all talk the same set of protocol messages to the validator)

MicBowman (Thu, 27 Sep 2018 23:19:10 GMT):
@jsmitchell yup

MicBowman (Thu, 27 Sep 2018 23:39:22 GMT):
trust in the keys will have to happen out of band... its not a good solution to our bigger problem (externalizing claims about the state of the *ledger* rather than the state of a *validator*)

MicBowman (Thu, 27 Sep 2018 23:39:35 GMT):
but the out of band trust is a reasonable first step

MicBowman (Thu, 27 Sep 2018 23:40:15 GMT):
frankly... i can probably get away with just adding a key to the rest api... that seems like a very low impact way to get *some* attestation

MicBowman (Thu, 27 Sep 2018 23:40:33 GMT):
and its not obvious why i would trust the validator (in isolation) any more than the rest api

jsmitchell (Fri, 28 Sep 2018 00:16:30 GMT):
👍🏻

thou_shalt (Fri, 28 Sep 2018 02:31:23 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=mTdkL3ZGwzxvnXTdD) @Dan thanks a lot

danintel (Fri, 28 Sep 2018 04:13:21 GMT):
Garbage Documentation
Screenshot_2018-09-27-21-09-23.png

danintel (Fri, 28 Sep 2018 04:15:10 GMT):
@kelly @rjones a sample of the pre-release doc clutter we have

LeonardoCarvalho (Fri, 28 Sep 2018 10:53:59 GMT):
@MicBowman I'm developing a little spring REST gateway that will do almost as you thinks... I intend to use it to enroll external public keys and add them to the identity processor. The process will need an admin to approve the enrollment, and a key pair represents it. After that, the gateway would use the identity to authorize or not access to itself, and the enrollment can be revoked by an admin anytime.

MicBowman (Fri, 28 Sep 2018 14:33:28 GMT):
@LeonardoCarvalho do you have more detail? that sounds promising

adamludvik (Fri, 28 Sep 2018 19:45:43 GMT):
@boydjohnson @jsmitchell I have got unit tests and devmode liveness passing with a pure-rust block store.

boydjohnson (Fri, 28 Sep 2018 19:46:32 GMT):
Sweet work, @adamludvik .

adamludvik (Fri, 28 Sep 2018 19:47:19 GMT):
If I can get this through an LR run, then access to the blockstore database will be pure Rust from the block publisher/chain controller down.

jsmitchell (Fri, 28 Sep 2018 19:47:34 GMT):
i like that

adamludvik (Fri, 28 Sep 2018 20:15:43 GMT):
PR is up https://github.com/hyperledger/sawtooth-core/pull/1885, will want to do some stability/performance testing before merging though

LeonardoCarvalho (Fri, 28 Sep 2018 21:44:49 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=hSc6MKgkhL6kLdQkB) @MicBowman I think tomorrow I can show a draft of my idea, is based on Cisco's Simple Certificate Enrollment Protocol and a micro service architecture I've deployed in the past using Spring secutiry

LeonardoCarvalho (Fri, 28 Sep 2018 21:44:49 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=hSc6MKgkhL6kLdQkB) @MicBowman I think tomorrow I can show a draft of my idea, is based on Cisco's Simple Certificate Enrollment Protocol and a micro service architecture I've deployed in the past using Spring security

resreassure (Fri, 28 Sep 2018 21:55:44 GMT):
Has joined the channel.

LeonardoCarvalho (Sat, 29 Sep 2018 12:12:23 GMT):
@MicBowman , there's the 10000ft view:

LeonardoCarvalho (Sat, 29 Sep 2018 12:12:30 GMT):

Clipboard - September 29, 2018 9:12 AM

LeonardoCarvalho (Sat, 29 Sep 2018 12:13:23 GMT):
The particular problem I need to solve is that Brazil's main Cert Chain is off the mundial chains...

LeonardoCarvalho (Sat, 29 Sep 2018 12:14:25 GMT):
And, since it's a little pricey for us (more than US$ 150), I intend to put a mechanism to create local CAs based on user created keys, and probably PGP ones as well

LeonardoCarvalho (Sat, 29 Sep 2018 12:15:43 GMT):
Using Spring's capabilities, a single executable (signed and/or encrypted) jar can be deployed almost anywhere

LeonardoCarvalho (Sat, 29 Sep 2018 12:21:32 GMT):
the IETF draft is this: https://tools.ietf.org/html/draft-gutmann-scep-10

phaniac (Mon, 01 Oct 2018 18:18:15 GMT):
Has joined the channel.

wdmason (Tue, 02 Oct 2018 09:06:34 GMT):
Has joined the channel.

Gerhardvd (Tue, 02 Oct 2018 09:07:24 GMT):
Has joined the channel.

kirkwood (Fri, 05 Oct 2018 15:10:55 GMT):
I have filed https://jira.hyperledger.org/browse/STL-1461 against sawtooth-raft, or at least that was my intention. Was the plain ol' Hyperledger Sawtooth JIRA the best place?

adamludvik (Fri, 05 Oct 2018 15:17:28 GMT):
@kirkwood that looks fine.

kirkwood (Fri, 05 Oct 2018 15:24:28 GMT):
Terrific. I like the `adhoc` stuff; makes this sort of testing very smooth.

adamludvik (Fri, 05 Oct 2018 15:49:05 GMT):
Thanks!

adamludvik (Fri, 05 Oct 2018 15:49:26 GMT):
@agunde @Dan @jsmitchell @amundson @pschwarz I have been giving some thought to dependency and duplicate transaction validation and I'd like to start a discussion around removing both checks from the core validator. Instead I'd like to consider making these checks the responsibility of app developers. This is probably not realistic for the near future, but I think the tradeoffs are interesting to consider. Currently, the core validator is required to flag a transaction as invalid if either it has already been committed at some time in the past, or if one of the dependencies it explicitly calls out has not been committed. This requires maintaining a space efficient data structure of all transactions ever committed indefinitely that is also efficient enough to query that it doesn't slow performance. At 10 TPS this means keeping track of roughly a million new transactions per day, and being able to check this list for duplicates for every new transaction. I think this is a tall order over time. Other platforms I have researched have essentially avoided the issue altogether by tying transactions to some "owned entity". Bitcoin ties transactions to a coin owned by some key. Ethereum ties transactions to an address and requires the nonce to match, so that a duplicate transaction would fail the second time around. In other words, both of these platforms are able to leverage domain specific knowledge to determine duplicate transactions are invalid. But domain specific knowledge is something we don't have in the core validator. In the case of transaction dependencies, I think we already do this validation in the application layer for many transaction types. Xo transactions depend on a game being created before it can be played, and transactions are marked as invalid if the game doesn't exist. Smallbank transactions depend on both accounts existing. Additionally, dependencies can be enforced by batching transactions together. In the case of duplicate transactions, this change would place an extra burden on application developers to do this check somehow, but we do have some examples already. The same Xo transaction cannot be applied twice, because the position it attempts to take will already have been taken. Seth transactions use the nonce strategy from Ethereum. Smallbank does not have this check, but modifying it to include an account nonce or scheme similar to bitcoins seems simple. The other downside to this change is that the core platform would no longer be able to assume a transaction maps to at most one block. In spite of the downsides, I think this change is still interesting to consider and to gather more data on. I suspect that for a large network, performance will be dragged down. I also suspect that removing these checks will simplify and speed up block validation by offloading all transaction validation to the transaction processors (with the exception of signature and structure).

jsmitchell (Fri, 05 Oct 2018 15:59:14 GMT):
what's your proposal for intkey that doesn't introduce some weird versioning/serialization smell?

adamludvik (Fri, 05 Oct 2018 16:18:59 GMT):
I think it is more interesting to collaborate on a higher level first to determine: 1. if our current solution actually works long term, 2. if moving this validation to the application layer improves performance or design simplicity, and 3. whether it is an unreasonable request of application developers to solve this problem. Intkey is a weird example because it is contrived to be simplistic and everyone owns everything, but I agree it needs to be considered if 1-3 show promising results. It also doesn't seem that outrageous for a contrived example to allow duplicate transactions and resubmitting the same transaction by other parties.

adamludvik (Fri, 05 Oct 2018 16:19:53 GMT):
Primarily what I am looking for is whether this seems like an interesting idea to pursue, or a waste of time.

jsmitchell (Fri, 05 Oct 2018 16:27:37 GMT):
I don't think "everything owns everything" is the reason intkey is problematic. I think it's problematic because the operations are valid in any order, and certainly the same operation can happen twice (which is semantically the same as currency transfer operations from 'owned' accounts, unless something like utxo is adopted). The contract with the user is that when they sign a transaction it is executed no more than once. That's how we avoid double spend at the platform level. I agree that an existence check against an unbounded set is infeasible, but we need an answer on how we preserve key platform guarantees with any alternate proposal.

adamludvik (Fri, 05 Oct 2018 16:45:18 GMT):
Contract with which user? The application developer or the application user? I think the interesting question is "Does avoiding double spend at the platform level force us to do existence checks against an unbounded set?" I suspect the answer is yes, since the more efficient solutions I have come across so far all require domain specific knowledge (though I am still looking). I think it is worth explicitly considering the tradeoff between breaking this contract with the application developer at some far point in the future and committing to doing this inefficient check indefinitely.

jsmitchell (Fri, 05 Oct 2018 16:46:30 GMT):
the contract with the entity signing a transaction

benoit.razet (Fri, 05 Oct 2018 16:50:23 GMT):
would a guarantee limited in time be an option? like, validator guarantees to keep a record of the transactions of the last year and after that the burden of protection against replay is at contract level. That would break noone, since noone has run sawtooth for more than a year ;)

jsmitchell (Fri, 05 Oct 2018 16:50:49 GMT):
replay attacks are a real thing

amundson (Fri, 05 Oct 2018 19:00:16 GMT):
@adamludvik I don't think that moving the complexity into applications is good. First, we would end up with just a lot of apps that don't solve double-spend because you need to understand it before you can protect against it. But also, we could solve this in a much easier way by adding a field to batch which specifies a block_id which must be in the history no greater than, say, 1000 block back. then batches "time out" if they don't make it on the chain fast enough. if the depth is great enough, no one has to think about it much. however, it would require clients to consult the chain prior to putting together the batch which is not currently required.

amundson (Fri, 05 Oct 2018 19:02:53 GMT):
this would seem to be important to solve in the context of checkpointing. so maybe it should just be the checkpoint_block_id (initially genesis block), and you solve the problem when you update to a new checkpoint block. would be awkward around checkpointing boundary though, which would have to be considered.

jsmitchell (Fri, 05 Oct 2018 19:39:38 GMT):
i think that would have to be done in the txn header, not the batch header

jsmitchell (Fri, 05 Oct 2018 19:41:02 GMT):
but I like the properties of constraining the search space for duplicates to a small maximum and saying "too old" otherwise

amundson (Fri, 05 Oct 2018 20:00:04 GMT):
could be the txn header

amundson (Fri, 05 Oct 2018 20:00:46 GMT):
yeah, would have to be

adamludvik (Fri, 05 Oct 2018 20:31:00 GMT):
What @amundson suggested seems similar to what @benoit.razet suggested, except using block height as a timer instead of wall clock time.

adamludvik (Fri, 05 Oct 2018 20:33:03 GMT):
You could also just define "too old" as after checkpointing

adamludvik (Fri, 05 Oct 2018 21:47:11 GMT):
Putting an expiration on batches seems good

adamludvik (Sat, 06 Oct 2018 02:40:10 GMT):
We could require including the current checkpoint block in the batch before submitting, and make all batches not committed expire after the checkpoint

jsmitchell (Sat, 06 Oct 2018 12:36:13 GMT):
Regardless of what you do at the batch level (optional) you must do it at the txn level to prevent someone from including an old txn in a new batch

jsmitchell (Sat, 06 Oct 2018 12:44:07 GMT):
I think coupling it to checkpointing may introduce some undesirable effects, like a bunch of invalid txns immediately after a checkpoint. Since verification of existence requires the blockstore, not the global state, we can retrieve the required N blocks prior to the checkpoint without mandating that their state root hashes exist in global state.

kelly_ (Sat, 06 Oct 2018 18:41:25 GMT):
@adamludvik have you looked into sparse merkle trees?

kelly_ (Sat, 06 Oct 2018 18:41:43 GMT):
it may be a more efficient way to see if that transaction has been included

kelly_ (Sat, 06 Oct 2018 18:43:08 GMT):
A Sparse Merkle Tree is a really interesting variant of a Merkle Tree. It was first proposed by Google in tracking whether certificates have been revoked or not. https://www.links.org/files/RevocationTransparency.pdf. It works by creating a massive, uncomputably large binary merkle tree composing of 2²⁵⁶ default leaves (if it’s assumed you are using 256 bit hash function, like sha256). Because most paths in the merkle tree (256 hashes down), has a default value, it’s possible to represent this data structure without having to keep it all in state. Different sizes can also be used. It doesn’t have to be 2²⁵⁶. The reason why this is valuable is because you essentially will deterministically know what position in the tree any information will hold (like a token ID or the hash of any certificate). It will always be in the same position. So for Google, hashing a certificate will always mean it is at a specific leaf in the tree. In Google’s case, each leaf is either zero or one, where one means, it has been revoked. Doing it this way means that when submitting a merkle root of the whole SMT means you not only show/include a cheap-to-verify proof of what certificates has been revoked, but you ALSO include proofs of certificates that have NOT been revoked (their leaf value is just zero). So, by proving that a leaf exists or contains some data, is also a proof of non-inclusion of the rest of the state.

kelly_ (Sat, 06 Oct 2018 18:43:17 GMT):
https://blog.ujomusic.com/a-plasma-cash-primer-27dcfd1d5ddc

kelly_ (Sat, 06 Oct 2018 18:45:22 GMT):
so just thinking from a query perspective, if the hash of a transaction is not in the sparse merkle tree than that is proof of non-inclusion

kelly_ (Sat, 06 Oct 2018 18:46:35 GMT):
a paper here - https://eprint.iacr.org/2016/683.pdf

kelly_ (Sat, 06 Oct 2018 18:47:51 GMT):
"We show that our definitions enable efficient space-time trade-offs for different caching strategies, and that *verifiable audit paths can be generated to prove (non-)membership in practically constant time (< 4 ms)* when using SHA-512/256"

kelly_ (Sat, 06 Oct 2018 19:02:00 GMT):
https://github.com/armaniferrante/sparse-merkle-tree/blob/master/src/lib.rs

LeonardoCarvalho (Sat, 06 Oct 2018 19:35:40 GMT):
Well, Sawtooth uses this structure already: https://chat.hyperledger.org/channel/sawtooth?msg=7MK7Hq9GHbgr9NoKj

HandsomeRoger (Mon, 08 Oct 2018 12:12:12 GMT):
Has joined the channel.

benoit.razet (Mon, 08 Oct 2018 12:27:06 GMT):
My apologies for potentially spamming the channel but I wanted to share some thoughts, related to the txn id membership pb, and based on my year of experience with Sawtooth and trying to understand where it stands compared to other blockchain platforms: I see Sawtooth as an interesting platform because a lot of the features that are set in stone in other blockchain solutions, are pluggable in Sawtooth. But this flexibility brings up a number of challenges for a deployed sawtooth network to stay manageable. Below is a list of pbs that are left open with barebone sawtooth-core but can be solved at various levels (operator level, app developer level, app framework level): - nondeterminism in transaction family that can impact the consensus of the network - txn unbounded runtime execution or inefficiencies in transaction family implementation that can impact the performance of the network - unbounded space consumption (set_state leak) can impact the performance of the network overtime because of space limitations - namespace restriction can impact the integrity of the global state store - tp maintenance for chain replay that can impact the auditability What’s common to all of them is that they can impact the performance/correctness of the platform as a whole, but still sawtooth-core leaves these problems unsolved. Now, compared to the txn replay pb that can definitely be solved at the application level, and keeping in mind that the txn replay attack can only impact the correctness of the app that does not prevent it, as a sawtooth user I’d be fine if sawtooth-core does not solve this pb, but if there exists a workable solution the better. Related to the addition of a block_id field to the transaction protobuf: I consider Sawtooth as both a blockchain and a smart contract platform, the latter being arguably independent from the blockchain. To support this idea, the transaction protobuf does not refer to anything block related or global state related (like a merkle root hash). As a result, individual transactions are not tied to any particular blockchain instance, and could be executed in different deployed environments. The underlying feature here is that transactions are _portable_ between blockchains. Adding a block_id field in the transaction protobuf would break this property. I concede that the portability feature may not be commonly used but I would not neglect it because smart contracts are an interesting paradigm that need not be too closely coupled with blockchain.

jsmitchell (Mon, 08 Oct 2018 13:09:12 GMT):
@benoit.razet do you agree that all those issues are mitigated when using seth, for example (because of the additional enforcement on contract storage/gas/limited opcodes from the evm interpreter)?

benoit.razet (Mon, 08 Oct 2018 13:09:51 GMT):
completely agree, that's what I realized this we

jsmitchell (Mon, 08 Oct 2018 13:10:01 GMT):
I feel like they are, which is an existence proof for being able to solve these problems at that layer

jsmitchell (Mon, 08 Oct 2018 13:10:25 GMT):
without calcifying the core

jsmitchell (Mon, 08 Oct 2018 13:12:11 GMT):
it would be nice to have an out-of-the-box tp with those features uncoupled to ethereum

benoit.razet (Mon, 08 Oct 2018 13:13:03 GMT):
I had completely overlooked that seth fixed all these issues, instead I considered seth as just an effort to potentially attract eth aficionados to sawtooth

benoit.razet (Mon, 08 Oct 2018 13:13:18 GMT):
it would nice if sabre would provide these guarantees too

benoit.razet (Mon, 08 Oct 2018 13:13:18 GMT):
it would nice if sabre could provide these guarantees too

amundson (Mon, 08 Oct 2018 13:14:24 GMT):
that last point around block_id - not sure about portable between blockchains (I think that would need to be shown with a second blockchain impl), but we definitely plan to run smart contract TPs in different configurations that do not involve blocks

jsmitchell (Mon, 08 Oct 2018 13:14:32 GMT):
I think the more powerful idea is that those concepts can be enforced in the application domain

jsmitchell (Mon, 08 Oct 2018 13:14:39 GMT):
which seth demonstrates

benoit.razet (Mon, 08 Oct 2018 13:26:03 GMT):
@amundson maybe portable is too strong of a term, but I still think it has value. I'm not 100% sure, but bitcoin txns also have this property but it's a side effect of a transaction being the asset, and arguably Ethereum also has this property because the nonce in the txn is a monotonic counter linked to the account of the user, so it's more a local property and not a global one.

Dan (Mon, 08 Oct 2018 13:34:23 GMT):
I don't know how important the name is, but we could call it checkpoint_id or something instead of block_id. I'm aware of at least one place if not two where someone might want to run a TP outside of sawtooth proper. If you have a sensor or some device with limited connectivity you might not want a requirement for too frequent of checking in with the blockchain. In those environments an infrequent checkpoint might be ok (with the caveat noted above that times around those checkpoints need some thought).

Dan (Mon, 08 Oct 2018 13:34:23 GMT):
I don't know how important the name is, but we could call it checkpoint_id or something instead of block_id. I'm aware of at least one place if not too where someone might want to run a TP outside of sawtooth proper. If you have a sensor or some device with limited connectivity you might not want a requirement for too frequent of checking in with the blockchain. In those environments an infrequent checkpoint might be ok (with the caveat noted above that times around those checkpoints need some thought).

adamludvik (Mon, 08 Oct 2018 15:55:53 GMT):
@kelly_ SMTs look good, but they still grow unbounded with the length of the chain, which I think is a problem we want to address. I had considered using radix tree previously in a similar way. At the end of this article, they mention something called "Plasma Cash" that seems to be using it as a solution to the problem we discussed. https://medium.com/@kelvinfichter/whats-a-sparse-merkle-tree-acda70aeb837

adamludvik (Mon, 08 Oct 2018 15:58:55 GMT):
@benoit.razet I agree that keeping transactions portable across deployments and/or implementations is a desirable feature and any changes should take this into consideration.

Dan (Mon, 08 Oct 2018 16:04:05 GMT):
The academic paper on SMTs suggested that the expected case would be a balanced tree with a uniform distribution across the leaves. That seems like a worst case to me because then you've minimized the caching/pruning you can do. It went into some effort to talk about clustering as though that's bad .. I guess leading to an imbalanced tree. Maybe the cost of the proof is higher there because you can't list as many default values. But the overall cost of storing the structure seems better because you can represent more of the tree by defaults - so it's actually sparse.

Dan (Mon, 08 Oct 2018 16:04:14 GMT):
I feel like I'm missing something.

kelly_ (Mon, 08 Oct 2018 16:21:43 GMT):
if the value is default then you don't include it in the tree

kelly_ (Mon, 08 Oct 2018 16:22:34 GMT):
@adamludvik agree on the unbounded length. I don't believe plasma cash solves, the plasma cash tree is bounded to the number of tokens represented in the tree, so it isn't impacted by chain length

kelly_ (Mon, 08 Oct 2018 16:28:40 GMT):
@dan i think the assumption is uniform because the tree covers the entire namespace and hashes occur randomly across that namespace

Dan (Mon, 08 Oct 2018 17:23:21 GMT):
Yeah I guess my point was if you imagine the leaf nodes as a bit string 11110000 implies you don't have anything to store for half the tree but 10101010 implies you need to store (or reconstruct) all of the branches.

kelly_ (Mon, 08 Oct 2018 18:13:04 GMT):
right

Dan (Mon, 08 Oct 2018 20:16:20 GMT):
... and 10101010 represents uniform distribution which seems worse than clustering.

kirkwood (Thu, 11 Oct 2018 18:05:11 GMT):
Outlined a potential fix that seems to work in https://jira.hyperledger.org/browse/STL-1461; feedback welcome. Especially feedback of the form, "no, don't do that."

kirkwood (Thu, 11 Oct 2018 18:19:45 GMT):
Man, I keep meaning to put these messages in #sawtooth-consensus-dev . . .

Dan (Thu, 11 Oct 2018 18:24:32 GMT):
awesome!! I'll do a little redirect over in the consensus channel.

bobonana (Thu, 11 Oct 2018 18:50:38 GMT):
Has joined the channel.

bobonana (Thu, 11 Oct 2018 18:53:47 GMT):
is there any chance that we could get the sawtooth-rest-api package https://github.com/hyperledger/sawtooth-core/tree/master/rest_api available for installation through pip? it's available in APT as `python3-sawtooth-rest-api`

cuevrob (Thu, 11 Oct 2018 21:11:21 GMT):
Has left the channel.

danintel (Fri, 12 Oct 2018 17:57:13 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=nkNWtN7bhYfkDd6os) @bobonana I don't make that decision, butI would file an issue at https://jira.hyperledger.org/projects/STL so others can hang their hat there and mention they need it too.

Dan (Fri, 12 Oct 2018 20:03:13 GMT):
Do we publish any wheels? I think we just publish debs.

amundson (Sat, 13 Oct 2018 02:54:39 GMT):
@bobonana this was discussed briefly in next-directory chat. resolution is basically, don't import code from that sawtooth-rest-api, it's not a library.

MohitJuneja (Tue, 16 Oct 2018 03:27:27 GMT):
Has joined the channel.

kelly_ (Tue, 16 Oct 2018 16:47:28 GMT):
https://docs.google.com/document/d/1bQQGT8PKZXzhdceNVRIXoL9NQpdQCY8kGcOrUa-opFY/edit?usp=sharing

kelly_ (Tue, 16 Oct 2018 16:48:01 GMT):
^ blog detailing new 1.1 features and ecosystem growth/apps since 1.0 - would appreciate any feedback

kelly_ (Tue, 16 Oct 2018 16:48:29 GMT):
@adamludvik @achenette

boydjohnson (Tue, 16 Oct 2018 17:02:14 GMT):
@kelly_ Would we want to mention some of the apps built with Sawtooth that have presented at Tech Forum: the Remme folks, @FrankCastellucci's folks.

adamludvik (Tue, 16 Oct 2018 17:08:35 GMT):
@kelly_ how would you like feedback?

kelly_ (Tue, 16 Oct 2018 17:14:25 GMT):
@adamludvik - comments on the doc would be preferred, if there are spelling or grammar mistakes just put track changes on an make them

adamludvik (Tue, 16 Oct 2018 17:14:42 GMT):
sounds good

kelly_ (Tue, 16 Oct 2018 17:15:00 GMT):
@boydjohnson yea I've been thinking about linking to this: https://github.com/hyperledger/sawtooth-website/blob/0e8b4e181ea6deaae4f979a4dadef448c7c987a6/Users.md

kelly_ (Tue, 16 Oct 2018 17:15:35 GMT):
haven't had the time to make the wordwrap changes to get it merged

kelly_ (Tue, 16 Oct 2018 17:15:56 GMT):
linking to a PR seems like a bad practice from a longevity perspective. I guess I could just put it in a google doc and link to that

boydjohnson (Tue, 16 Oct 2018 17:16:13 GMT):
Yeah, looks good, anyway.

adamludvik (Tue, 16 Oct 2018 17:19:25 GMT):
Looks good, couple minor comments.

achenette (Tue, 16 Oct 2018 20:09:21 GMT):
@kelly_ - Nice blog post! I added some comments/questions, plus a handful of wording and grammar tweaks as suggestions (I couldn't find a way to turn on "tracked changes").

kelly_ (Tue, 16 Oct 2018 20:48:45 GMT):
Awesome thanks! @achenette

grapebaba (Wed, 17 Oct 2018 01:43:35 GMT):
Has joined the channel.

grapebaba (Wed, 17 Oct 2018 01:50:30 GMT):
hi guys, i am a newbie here, I have a question. I explore some tasks in JIRA, however I can't find some design docs attached on JIRA task

Dan (Wed, 17 Oct 2018 12:45:36 GMT):
link?

kelly_ (Wed, 17 Oct 2018 16:56:51 GMT):
@amundson from a rust conversion perspective what do you think is the right number? github stats on sawtooth core say ~25%, i remember looking at a validator block diagram and it was ~1/2 of the main components

kelly_ (Wed, 17 Oct 2018 16:57:06 GMT):
thats weird, don't know why that did a strikethrough

kelly_ (Wed, 17 Oct 2018 16:57:26 GMT):
oh cause i used ~

kelly_ (Wed, 17 Oct 2018 16:58:48 GMT):
is 40% a good number?

adamludvik (Wed, 17 Oct 2018 19:30:57 GMT):
@kelly_ the numbers on github are going to be skewed because they include a number of transaction families and utilities written in rust that are not the validator. roughly speaking, the architecture consists of the following layers: networking, handler/routing layer, journal, database, and transaction execution. So far, the journal and database layers have been mostly rewritten in Rust. Networking, handler/routing, and transaction execution are still Python. So if you want to say 40% of the architectural layers are in Rust, I think that would be pretty accurate. The layers are definitely not identical in size though, so without doing an actual line count its hard to come up with a verifiable percentage.

kelly_ (Wed, 17 Oct 2018 19:41:18 GMT):
@adamludvik thanks I appreciate that, will change to 40%. it's not perfect but good high level indication

kelly_ (Wed, 17 Oct 2018 19:41:33 GMT):
also noted on github, on the flip side there are a bunch of tests that are written in python which may skew it the other way

jsmitchell (Wed, 17 Oct 2018 19:41:43 GMT):
I think there is also some python code that is no longer fully used but is still in the repo

danintel (Wed, 17 Oct 2018 22:22:55 GMT):
@kelly_ You may want to mention in the `Users.md` list that it's not vetted or something like that.

kelly_ (Thu, 18 Oct 2018 13:56:26 GMT):
yea good call @danintel

kelly_ (Thu, 18 Oct 2018 13:56:37 GMT):
something like 'not an endorsement'

silasdavis (Thu, 18 Oct 2018 15:01:51 GMT):
@agunde thanks for update 1.1 sounds like it will be a nice release

silasdavis (Thu, 18 Oct 2018 15:02:31 GMT):
How vanilla (for want of a better word) PBFT implementation in terms of the original paper?

silasdavis (Thu, 18 Oct 2018 15:03:01 GMT):
There are a number of optimisations that most PBFT-likes make particular aeoun

kelly_ (Thu, 18 Oct 2018 15:03:15 GMT):
@bridgerherman may be able to answer

kelly_ (Thu, 18 Oct 2018 15:03:23 GMT):
I belive that it was meant to follow the original pbft paper

kelly_ (Thu, 18 Oct 2018 15:03:40 GMT):
is there a tldr; on what aeoun does?

silasdavis (Thu, 18 Oct 2018 15:03:51 GMT):
Oops... Around worst case performance in particular

kelly_ (Thu, 18 Oct 2018 15:03:59 GMT):
oh ha

kelly_ (Thu, 18 Oct 2018 15:04:01 GMT):
got ya

silasdavis (Thu, 18 Oct 2018 15:04:05 GMT):
Haha sorry typing on phone

kelly_ (Thu, 18 Oct 2018 15:04:57 GMT):
That's interesting, in general I have seen optimizations for the optimistic scenario more so than around the worst-case

agunde (Thu, 18 Oct 2018 15:09:20 GMT):
@silasdavis here is the current introduction documentation for pbft https://github.com/hyperledger/sawtooth-pbft/blob/master/docs/source/introduction-to-sawtooth-pbft.rst You can see here that the implementation is based the 1999 paper with some adaptions for blockchain. Also here is a description of the work being done now in an effort to make it ready for production https://github.com/hyperledger/sawtooth-pbft/blob/master/docs/source/future-work.rst

grapebaba (Thu, 18 Oct 2018 15:38:43 GMT):
Hi guys

grapebaba (Thu, 18 Oct 2018 15:40:02 GMT):
the sabre project does not support smart contract transaction family right now?

zac (Thu, 18 Oct 2018 15:46:17 GMT):
Sabre is a smart contract transaction family

agunde (Thu, 18 Oct 2018 15:48:15 GMT):
@grapebaba We have a #sawtooth-sabre channel.

adamludvik (Thu, 18 Oct 2018 16:25:34 GMT):
@silasdavis it is quite vanilla

adamludvik (Thu, 18 Oct 2018 16:28:07 GMT):
I am working on some basic stuff to mitigate the issues with worst-case performance, but long term we hope to implement additional algorithms with better worst-case performance. I view Sawtooth PBFT as kind of "getting our feet wet" in terms of proving out our new API and understanding the level of difficulty in implementing additional algorithms.

adamludvik (Thu, 18 Oct 2018 16:29:06 GMT):
I am especially interested in the rBFT algorithm that Indy chose for situations where worst-case performance is important.

silasdavis (Fri, 19 Oct 2018 11:45:01 GMT):
@adamludvik sounds good, if you are at all interested in porting the Tendermint consensus state machine I could approach people about that, they have a fairly good formal description on the way are are working on proving

silasdavis (Fri, 19 Oct 2018 11:45:01 GMT):
@adamludvik sounds good, if you are at all interested in porting the Tendermint consensus state machine I could approach people about that, they have a fairly good formal description on the way are are working on machine proofs

silasdavis (Fri, 19 Oct 2018 11:47:06 GMT):
snow white might be worth a look also

silasdavis (Fri, 19 Oct 2018 11:47:16 GMT):
https://eprint.iacr.org/2016/919.pdf

adamludvik (Fri, 19 Oct 2018 15:38:24 GMT):
We strongly considered Tendermint early on and I think I remember @Dan maybe having some interest in it? I have not heard of snow white, will take a look.

Dan (Fri, 19 Oct 2018 15:46:04 GMT):
Yeah tendermint is interesting. The thing @adamludvik helped us understand is that the tendermint code itself draws a larger box than what we would draw around consensus .. i.e. it manages network layer etc. The exercise of rewriting just the tendermint algorithm would be cool but not small.

Dan (Fri, 19 Oct 2018 15:47:31 GMT):
I couldn't articulate off the top of my head what the advantages of tendermint are over pbft. Do you have those handy @silasdavis

silasdavis (Fri, 19 Oct 2018 16:11:53 GMT):
So given you are in the market for implementing PBFT yourselves - the core Tendermint algorithm (as opposed to Tendermint the project) should be reasonably approachable - a little more involved than PBFT - possibly - but not a lot. The layers on top to make it work well in a real system is obviously non-trivial.

silasdavis (Fri, 19 Oct 2018 16:12:36 GMT):
But I was suggesting you might consider implementing the core in rust yourselves - you would not be behoven to their P2P layer or ABCI interface

silasdavis (Fri, 19 Oct 2018 16:12:47 GMT):
here is a recent paper that should give a good summary: https://arxiv.org/pdf/1807.04938.pdf

silasdavis (Fri, 19 Oct 2018 16:13:36 GMT):
they main difference is that they have a termination mechanism that means they have a single mode of operation and this can be less costly than the recovery phase in PBFT

silasdavis (Fri, 19 Oct 2018 16:19:34 GMT):
There's obviously plenty of complexity around how you validate transactions and make sure the decision value you are using is up-to-date, and gossip etc. But I think they've refined down the core to something fairly nice in that recent paper

silasdavis (Fri, 19 Oct 2018 16:19:51 GMT):
snow white: https://eprint.iacr.org/2016/919.pdf

amundson (Fri, 19 Oct 2018 16:21:16 GMT):
@silasdavis any thoughts about honeybadger? I see they have a start on a rust implementation (https://github.com/rphmeier/honeybadger)

amundson (Fri, 19 Oct 2018 16:21:49 GMT):
(like, a very minimal start, but at least there is some interest)

silasdavis (Fri, 19 Oct 2018 16:39:00 GMT):
yeah I was actually going to suggest that, though I wasn't sure what your finality requirements are

silasdavis (Fri, 19 Oct 2018 16:40:55 GMT):
it's in the fully async model and potentially has quite high latency

silasdavis (Fri, 19 Oct 2018 16:40:55 GMT):
it's in the fully async model and potentially has quite high latency (i.e. convergence to a decision with probably 1, but not sure how it responds under various network conditions/attacks - probably better on large networks where law of large numbers can help)

silasdavis (Fri, 19 Oct 2018 16:41:08 GMT):
but it would definitely be an interesting one to have in the mix

silasdavis (Fri, 19 Oct 2018 16:41:21 GMT):
as would hashgraph though patent encumbered in US

silasdavis (Fri, 19 Oct 2018 16:41:40 GMT):
incidentally I'm always going back to these notes: http://www.scs.stanford.edu/17au-cs244b/notes/

silasdavis (Fri, 19 Oct 2018 16:42:15 GMT):
when trying to remember the tradeoffs, they're really succinct and actually have decent sketch proofs for some properties

silasdavis (Fri, 19 Oct 2018 16:42:31 GMT):
e.g. for HoneyBadger: http://www.scs.stanford.edu/17au-cs244b/notes/honey-badger.txt

amundson (Fri, 19 Oct 2018 16:48:12 GMT):
this is great stuff for #sawtooth-consensus-dev if we start to dive deeper

kthblmfld (Fri, 19 Oct 2018 18:40:56 GMT):
Has left the channel.

arsulegai (Wed, 24 Oct 2018 06:06:15 GMT):
Any suggestions / resources to help me with mocking hyper library to test server/client?

Dan (Wed, 24 Oct 2018 16:58:21 GMT):
@arsulegai try in #sawtooth

arsulegai (Wed, 24 Oct 2018 18:03:52 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=e4KT4KA9zJHCNoPTD) @Dan Sure, thanks

MatthewRubino (Fri, 26 Oct 2018 13:01:38 GMT):
Has joined the channel.

ApurvTandon (Sun, 28 Oct 2018 20:13:14 GMT):
Has joined the channel.

Dan (Mon, 29 Oct 2018 15:37:35 GMT):
@agunde Is the intent of the `unset` here https://github.com/hyperledger/sawtooth-core/blame/d3bf28029090ae38a4d743d417c7ca15b6cd7e1e/protos/identity.proto#L24 That the transaction will fail unless the sender explicitly sets `permit` or `deny`?

agunde (Mon, 29 Oct 2018 15:49:12 GMT):
Yes, that is standard for all enums protobuf we use. If you don't have an unset option set to 0, what ever you set to 0 will be the default.

agunde (Mon, 29 Oct 2018 15:50:32 GMT):
Which can cause some fun bugs. @Dan

Dan (Mon, 29 Oct 2018 16:00:57 GMT):
thx :)

arsulegai (Wed, 31 Oct 2018 09:38:29 GMT):
Wanted to discuss on the Jira item - https://jira.hyperledger.org/browse/STL-1375 This has two parts to itfirst one is to allow ``context.get_state()`` (or depending on which language ``stateInstance.getState()``) accept partial addresses. If partial address is supplied then return the Map of key: address and value: state_data matching the prefix. --> I am planning to extend current SDK method to get this done. i.e. get_state() will now allow query by partial addresses.

arsulegai (Wed, 31 Oct 2018 09:38:29 GMT):
Wanted open discussion on the Jira item - https://jira.hyperledger.org/browse/STL-1375 There are two parts to it *Part 1:* allow ``context.get_state()`` (or depending on which language ``stateInstance.getState()``) accept partial addresses. If partial address is supplied then return the Map of key: address and value: state_data matching the prefix. --> I am planning to extend current SDK method to get this done. i.e. get_state() will now allow query by partial addresses. *Part 2:* to get list of all addresses where there's non-empty data. I will introduce a new method something like ``context.list_data_addresses()`` Please feel free to comment / suggest your views

arsulegai (Wed, 31 Oct 2018 09:38:29 GMT):
Wanted open discussion on the Jira item - https://jira.hyperledger.org/browse/STL-1375 There are two parts to it *Part 1:* allow ``context.get_state()`` (or depending on which language ``stateInstance.getState()``) accept partial addresses. If partial address is supplied then return the Map of key: address and value: state_data matching the prefix. --> I am planning to extend current SDK method to get this done. i.e. get_state() will now allow query by partial addresses. *Part 2:* to get list of all addresses where there's non-empty data. I will introduce a new method something like ``context.list_data_addresses()`` Please feel free to comment / suggest your views

boydjohnson (Wed, 31 Oct 2018 13:33:49 GMT):
@arsulegai I would wonder if this would be good for an RFC, https://github.com/hyperledger/sawtooth-rfcs, then there could be greater discussion of the various tradeoffs.

chainsaw (Wed, 31 Oct 2018 14:29:05 GMT):
Has joined the channel.

amundson (Wed, 31 Oct 2018 15:59:52 GMT):
@arsulegai I agree with @boydjohnson that we would need an RFC for this -- but first, maybe we should discuss it here. do you have some example use cases? if you are reading across multiple state nodes, the first thought that occurs to me is that you may have defined the addressing scheme inefficiently.

Dan (Wed, 31 Oct 2018 16:06:22 GMT):
That feature request was submitted by @yoni and he will have better context on the motivations. Arun ( @arsulegai ) is looking at implementing it. I suggested we use jira for fleshing out this feature vs rfcs. I had a couple reasons in mind .. when the issue first arose the RFC process was still new. More importantly I'd like to explore whether for little features it's more efficient to do a little jira review and add the feature rather than a multi-month RFC process.

benoit.razet (Wed, 31 Oct 2018 16:39:48 GMT):
I can see the usefulness of the 2nd part in the description of STL-1375 which is to determine presence or absence of data at an address. I would be interested to know the motivation to extend the get operation to a prefix instead of a full address since it breaks my mental computational model for sawtooth. Is this really a "little feature"? it looks like it would impact the parallel scheduler, and some protos like the state_context.proto.

amundson (Wed, 31 Oct 2018 16:58:52 GMT):
@Dan it changes the stable API surfaces (presumably, including the stable interface between the validator and TPs) and adds a degree of ongoing support for the overall team. an RFC is appropriate.

amundson (Wed, 31 Oct 2018 17:05:31 GMT):
I don't have a strong opinion on this feature yet - the JIRA isn't enough to justify changing the API surface, IMO - but, that's what we can flush out here (an understanding of why its useful)

amundson (Wed, 31 Oct 2018 17:30:50 GMT):
the existence of data at addresses isn't completely sufficient for any use case that handles hash collision (if addressing is done via hashing). you can have an index node, though that is not good from a parallelization perspective. maybe the addressing scheme which makes these features interesting is thus not hashing-based?

amundson (Wed, 31 Oct 2018 17:35:55 GMT):
for example, if I did an existence check on the settings namespace, that's not very informative because of the addressing scheme

benoit.razet (Wed, 31 Oct 2018 17:43:49 GMT):
@amundson do you mean an existence check from a prefix or a full address?

benoit.razet (Wed, 31 Oct 2018 17:49:12 GMT):
the use-case I have in mind is having a contract raise an InvalidTransaction when it finds unexpectedly data at an address. For example if the address is the result of the hash of a uid, then it would be up to the client to resbumit a transaction with a different uid to avoid collision.

amundson (Wed, 31 Oct 2018 18:07:04 GMT):
that could just be a flag on the set operation to abort if there is already data present though too, if you were going for efficiency

amundson (Wed, 31 Oct 2018 18:08:43 GMT):
but what I mean is, if you get the existence check/list of everything set in the settings namespace, you still have to take the setting name, hash it, and compare. you can't do the reverse, so it won't tell you what settings exist. doesn't seem useful at all with that addressing scheme.

amundson (Wed, 31 Oct 2018 18:11:01 GMT):
if you however had an addressing scheme that wasn't hashing, then maybe that existence list would have more meaning and value for the TP - i.e. give you an index without needing to write an index manually to a specific node (which involves write contention, which is bad)

amundson (Wed, 31 Oct 2018 18:15:49 GMT):
if that part of the tree that you were doing that existence list on had a lot of state nodes present, that would turn into quite a few reads within the validator though, which might make it less efficient overall than storing the index and incurring the write penalty

amundson (Wed, 31 Oct 2018 18:21:39 GMT):
abstractly, it would be easy to estimate the number of reads if you had a use case and knew the estimated number of state nodes. if there were N state nodes, then at worst something like 32*N reads to do that scan (assuming a normal size prefix for state addressing, because we would exclude that). you could potentially exclude the read of the actual data node too, so maybe 31*N worst case. some nodes in the tree would overlap so it would be less than 31*N in practice.

pschwarz (Wed, 31 Oct 2018 20:23:27 GMT):
In the longer run - a breaking change, for sure - would be to modify the messages to actually deal with no-set versus empty data, which are possibly two separate states

pschwarz (Wed, 31 Oct 2018 20:25:00 GMT):
@benoit.razet I think the use case is currently supported, as long as you're not writing empty data

pschwarz (Wed, 31 Oct 2018 20:25:00 GMT):
@benoit.razet I think your use case is currently supported, as long as you're not writing empty data

benoit.razet (Thu, 01 Nov 2018 12:26:03 GMT):
@amundson a flag on the get operation to indicate to return a boolean for presence of data would not hurt in addition to the flag on the set operation ;) I'm fine with the interface as it is today, it works for my use-case no problem @pschwarz , I was just thinking of what extension could save transfer of data between the validator and tp, but I can always burn one more address containing exclusively this boolean.

arsulegai (Thu, 01 Nov 2018 13:47:01 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=pSuMPrta9ifoqttYr) @benoit.razet Yes, there'll be new addition to proto file for second part. However, isn't parallel scheduling dependent on input/output addresses of transaction header? (I couldn't understand how context.get_state() will impact scheduling) Partial prefix can be in input addresses of transaction header, so TP has access to read data at addresses having these prefix. Please correct me if I misunderstood something here.

arsulegai (Thu, 01 Nov 2018 13:55:56 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=F8fz6ih3GhajpL6sR) @amundson Agree, Question extending this use case what would be efficient way to know how many setting entries are present?

jsmitchell (Thu, 01 Nov 2018 14:12:04 GMT):
There is no index, so you'd have to count them

amundson (Thu, 01 Nov 2018 15:06:16 GMT):
@benoit.razet yeah, I don't see any downside to the set-if-not-set boolean/and was-set return flag other than maybe just making the api bigger, so if it was useful. in most of the apps we have done though, we using hashing and handle hash collisions so neither are useful because you have to do a read and unpack a list.

amundson (Thu, 01 Nov 2018 15:11:21 GMT):
@arsulegai the point about interference with the parallel scheduler was, I think, commentary on setting input/output to prefixes instead of specific nodes. if you set input to a prefix, any write to a node in that prefix will cause serialization because you have contention between the transactions. similar but worse if you set output to a prefix. it's a feature but also a trade-off.

amundson (Thu, 01 Nov 2018 15:24:46 GMT):
@arsulegai If the goal is to allow searching or iterating through a set of state nodes then we would probably want to use an iterator-style pattern and not a list-all-the-addresses pattern. something that can handle the large address space available (2^(8*35)?), or a fair chunk of it.

amundson (Thu, 01 Nov 2018 15:25:48 GMT):
@arsulegai thoughts on the use case for this?

amundson (Thu, 01 Nov 2018 15:27:34 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=fc7f3a8b-62b2-4e05-8472-604df2d945df) @arsulegai No, it would not tell you how many settings entries are present because there may be multiple settings stored at any given node due to hash collisions. (Also, not interesting to know the number of settings set.)

jsmitchell (Thu, 01 Nov 2018 15:30:07 GMT):
for a sparse tree, a depth first search iterator would be fine, and I believe one is already present in the merkle trie implementation. It would just be a question of exposing the interface. Once you got above a certain number of leaf nodes or a certain overall response size (in bytes), you'd probably want to bail out of the call (or do something like cursor/pagination).

amundson (Thu, 01 Nov 2018 15:33:56 GMT):
what do you do with transactions that take forever to run, because they are scanning the state tree? I'm not sure we have anything protecting against that currently. Probably we should add something that causes transactions that make more than X number of iterations calls fail if we add this so its deterministically capped.

amundson (Thu, 01 Nov 2018 15:35:51 GMT):
even without this iterator, from a sabre point-of-view, capping the number of requests would be good.

amundson (Thu, 01 Nov 2018 15:36:32 GMT):
@agunde ^

Dan (Thu, 01 Nov 2018 16:48:49 GMT):
@yoni if you are around can you comment here or in the jira (preferably in the jira) on the motivating use case? (https://jira.hyperledger.org/browse/STL-1375)

MatthewRubino (Fri, 02 Nov 2018 13:20:21 GMT):
Has left the channel.

Dan (Fri, 02 Nov 2018 14:34:37 GMT):
I haven't run tests locally in forever. I've been leaning on jenkins. Saw there was a maillist question though and I can repeat that failure on my system. Is anyone successful with bin/run_tests locally? ```Aborting on container exit... ERROR:__main__:Test error in UNIT-CLI ERROR:__main__:Command '['docker-compose', '-p', 'latest', '-f', './cli/tests/unit_cli.yaml', 'up', '--abort-on-container-exit']' returned non-zero exit status 1. Traceback (most recent call last): File "/Users/dcmiddle/project/stl/sawtooth-core/bin/run_docker_test", line 126, in main timeout=timer.remaining()) File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 418, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['docker-compose', '-p', 'latest', '-f', './cli/tests/unit_cli.yaml', 'up', '--abort-on-container-exit']' returned non-zero exit status 1. INFO:__main__:Container latest_unit-cli_1 ran image cli-tests:latest with install-type INFO:__main__:Shutting down with: ['docker-compose', '-p', 'latest', '-f', './cli/tests/unit_cli.yaml', 'down', '--remove-orphans', '--volumes'] Removing latest_unit-cli_1 ... done Removing network latest_default ERROR:__main__:Test UNIT-CLI failed ```

adamludvik (Fri, 02 Nov 2018 17:33:00 GMT):
I believe if you do `./bin/run_tests` locally from a clean system, it will build all the docker images you need automatically for you, but this build time gets counted as part of the test time, so the test can time out before it actually starts. @rbuysse can you confirm?

rbuysse (Fri, 02 Nov 2018 18:12:02 GMT):
I can deny

rbuysse (Fri, 02 Nov 2018 18:12:24 GMT):
you noticed that behavior a while ago and I did a wicked good PR to fix https://github.com/hyperledger/sawtooth-core/commit/e8f1749ad390dd5b8db9de9333fe3da82791a6d7#diff-6c20dc43cd4f6ca422048e2a848211d9

Dan (Fri, 02 Nov 2018 18:57:08 GMT):
I did build then up then run tests... ``` docker-compose -f docker/compose/sawtooth-build.yaml build docker-compose -f docker/compose/sawtooth-build.yaml up ```

Dan (Fri, 02 Nov 2018 18:57:08 GMT):
I did build then up then run tests... ``` docker-compose -f docker/compose/sawtooth-build.yaml build docker-compose -f docker/compose/sawtooth-build.yaml up bin/run_tests ```

arsulegai (Sat, 03 Nov 2018 10:17:11 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=XzgM8W6MkacNhFAdG) @amundson hmm, I agree to this. But as such SDK API to read several addresses (as long as transaction header in batch list allows it) at once would not affect scheduler was my point earlier.

arsulegai (Sat, 03 Nov 2018 10:18:25 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=bQxzBQyqeahHe9jtA) @amundson I would also request @yoni to comment on this.

Isaiah_Kim (Sat, 03 Nov 2018 14:31:59 GMT):
Has joined the channel.

eugene-babichenko (Mon, 05 Nov 2018 10:14:41 GMT):
Has joined the channel.

eugene-babichenko (Mon, 05 Nov 2018 10:17:43 GMT):
Hello, I would like to have a discussion about one thing regarding working with seeds/peers. Our development team want to have the possibility to add peers manually (for example, by invoking a REST API method). If the core team likes the feature I would create a pull request to the core repository with the implementation of this functionality.

ddhulla (Mon, 05 Nov 2018 10:46:00 GMT):
Has joined the channel.

Dan (Mon, 05 Nov 2018 19:16:40 GMT):
I think that's an interesting idea. In terms of administration, we have the `sawadm` CLI which effects the local filesystem. The `sawnet` command requests from multiple nodes (but does not change them) .. kind of a monitoring feature. The sawtooth processes are controlled (in ubuntu at least) using the OS mechanisms (systemd, systemctl,..) Where do you think a remote call to change local peers would be implemented? As far as general process for adding a feature: after gathering feedback here you probably need to write-up the approach as an RFC to gain wider feedback. That's usually the case for changing established APIs or significant new feature sets. See: https://github.com/hyperledger/sawtooth-rfcs And any of the PRs (open and/or closed) on that repo.

arsulegai (Tue, 06 Nov 2018 03:44:38 GMT):
@amundson @jsmitchell motivation is updated by @yoni in https://jira.hyperledger.org/browse/STL-1375

eugene-babichenko (Tue, 06 Nov 2018 07:54:32 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=wk4rG3rGHhE5PoJMm) @Dan Thank you for the feedback! I think that the overall idea of the implementation would be the following. We already have the method `Gossip.add_candidate_peer_endpoints` and I think it is suitable for implementation of such feature (I can be wrong). Then I will add a new protobuf definitions with something like `AddPeersRequest` (containing the list of peers to be added) and `AddPeersResponse`. With that I should be able to implement a corresponding handler in `sawtooth_validator.state.client_handlers`. With that handler it will be fairly easy to add `POST /peers` to the REST API and implement `sawnet peers add` on top of that method.

Dan (Tue, 06 Nov 2018 14:04:16 GMT):
who would have rights to add a peer?

eugene-babichenko (Tue, 06 Nov 2018 15:02:24 GMT):
Any REST API user I think. This is acceptable for us because we do not expose the REST API and use it only internally. This should not be a problem for Sawtooth too, because the documentation suggests using proxy servers to restrict access to a publicly exposed REST API server.

Dan (Tue, 06 Nov 2018 15:26:59 GMT):
That's probably too permissive. In the existing system, you need administrative access to alter the validator process including it's local configuration and peer assignment. If this was wide open then anyone within your intranet could kill your validator.

Dan (Tue, 06 Nov 2018 15:28:13 GMT):
I think this kind of feature would be good to add to a management console. Just need to think through some things like the appropriate security policies.

eugene-babichenko (Tue, 06 Nov 2018 16:20:10 GMT):
Ok, I got your point. Maybe it should be based on key permissioning with signed requests? Following such approach we can just add a role called `admin`, use it for off-chain permissioning on `POST /peers` method and work with it as described here https://sawtooth.hyperledger.org/docs/core/nightly/master/sysadmin_guide/configuring_permissions.html#transactor-roles-label

eugene-babichenko (Tue, 06 Nov 2018 16:20:10 GMT):
Ok, I got your point. Maybe it should be based on key permissioning with signed requests? Following such approach we can just add a role called `admin`, use it for off-chain permissioning on `POST /peers` method and work with it as described here https://sawtooth.hyperledger.org/docs/core/nightly/master/sysadmin_guide/configuring_permissions.html

Dan (Tue, 06 Nov 2018 18:50:01 GMT):
yeah that sounds like the right direction... extending the off-chain permissioning from transactors to validator-admins or something like that. I'm not sure whether the / a rest service should be cognizant of that or whether that decision is solely evaluated at the validator process. I'm inclined to think that this permissioning would be defined solely at the validator and that securing the rest endpoint would be a separate matter through separate means.. defense in depth etc. There should also be some discussion about whether this functionality would be added to the example REST API. That service gets treated like a default component by many users, while others deploy only custom rest services. You might see if others chime in on this hear over the next few days. You could also consider summarizing the discussion for the mail list for those who don't monitor this channel. Once you feel like you have a good sense of the problem and solution you can put up the RFC.

boydjohnson (Wed, 07 Nov 2018 21:08:28 GMT):
I noticed this inconsistency in the validator tp api and a difference in the Rust and Python SDKs around empty state addresses. ``` Validator TP Interface transaction has wildcarded inputs Get_State and the address is not in state entries: [] transaction has full addresses as inputs Get_State and the address is not in state entries: [Entry(address, data='')] Python Sdk entries: [] or [Entry(address, data='')] then entries: [] Rust Sdk entries: [] then Err(ContextError::ResponseAttributeError) entries: [Entry(address, data='')] then Ok(None) ```

amundson (Thu, 08 Nov 2018 15:15:06 GMT):
@eugene-babichenko seems like a good feature. I assume this would only apply when using static peering?

amundson (Thu, 08 Nov 2018 15:16:51 GMT):
not sure about whether it's appropriate for the REST API, we don't otherwise use it for administration

amundson (Thu, 08 Nov 2018 15:20:17 GMT):
@agunde any thoughts on this? ^

agunde (Thu, 08 Nov 2018 15:34:39 GMT):
I like the idea. The goal is essentially to be able to add new peers to at static network without needing to stop a node and edit the config file? (or command line) It does seem like a good candidate for expanding the permissioning. I would suggest coming up with a more specific role then `admin` (as this will need to be checked by the validator itself). Possible `admin.peers` which would open up other such use cases. Another thing to keep in mind is that off-chain permissioning is only local to that specific validator, so it would be good to think about if there is a use case that it you may want to put this setting on chain (using the Identity transaction family) so that all validators have the same permissioning.

boydjohnson (Thu, 08 Nov 2018 16:24:12 GMT):
I am thinking of the batch injection issue. In the Python validator, a batch injector implements a Python ABC, and the batch injector factory does dynamic imports of the python module. I am wondering what thoughts there are of loading a share library that uses a Rust interface to do batch injection. That will decouple the batch injector from the validator. It could be similar to Redis with its modules system in Redis 4. Or postgresql and modules like Postgis and pgrouting.

Dan (Thu, 08 Nov 2018 21:17:22 GMT):
@ltseeley fyi on my local system getting a unit test failure on your backport PR branch.

ltseeley (Thu, 08 Nov 2018 21:17:22 GMT):
Has joined the channel.

Dan (Thu, 08 Nov 2018 21:17:52 GMT):
```ERROR:__main__:Test error in TEST-PEER-LIST ERROR:__main__:Command '['docker-compose', '-p', 'latest', '-f', '/Users/dcmiddle/project/stl/sawtooth-core/integration/sawtooth_integration/docker/test_peer_list.yaml', 'up', '--abort-on-container-exit']' returned non-zero exit status 127. ```

Dan (Thu, 08 Nov 2018 21:18:28 GMT):
might be a build failure on my end.

Dan (Thu, 08 Nov 2018 21:19:50 GMT):
```devmode-1_1 | /project/sawtooth-core/sdk/examples/devmode_rust/bin/devmode-rust: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory ```

ltseeley (Thu, 08 Nov 2018 21:20:36 GMT):
Yeah, that test works for me locally

rbuysse (Thu, 08 Nov 2018 22:08:55 GMT):
do you have sawtooth-devmode and sawtooth-devmode-images?

rbuysse (Thu, 08 Nov 2018 22:08:55 GMT):
do you have sawtooth-devmode and sawtooth-devmode-local images?

Dan (Thu, 08 Nov 2018 22:09:20 GMT):
```$ docker image list '*devmode*' REPOSITORY TAG IMAGE ID CREATED SIZE sawtooth-devmode-rust-local latest 34fde9f29896 27 minutes ago 1.22GB sawtooth-devmode-local latest 5fdf70ec2c81 10 days ago 1.18GB```

Dan (Fri, 09 Nov 2018 19:29:00 GMT):
FWIW I got a clean build and test pass locally on Logan & Boyd's backport branch. Just confirming what Jenkins already produced.

Dan (Fri, 09 Nov 2018 19:35:19 GMT):
I guess the more meaningful thing is to test the poet repo against those binaries. That poet build is still :runner_tone5:

Dan (Sat, 10 Nov 2018 00:31:16 GMT):
I think test-dynamic-network is passing but I do see protobuf related errors. The liveness test is failing and I see the same errors there: ```poet-0_1 | [2018-11-09 23:04:43.049 DEBUG engine] Received message: CONSENSUS_NOTIFY_PEER_CONNECTED poet-0_1 | [2018-11-09 23:04:43.050 ERROR engine] Unknown type tag: CONSENSUS_NOTIFY_PEER_CONNECTED poet-0_1 | [2018-11-09 23:04:43.299 DEBUG engine] Received message: CONSENSUS_NOTIFY_PEER_CONNECTED poet-0_1 | [2018-11-09 23:04:43.299 ERROR engine] Unknown type tag: CONSENSUS_NOTIFY_PEER_CONNECTED ``` That message type has been around for 7 months so I don't know what's up there.

Dan (Sat, 10 Nov 2018 01:33:23 GMT):
Ok so that's probably a red herring (gray/blue herring for reference: :fish: ) Poet engine should probably just ignore those messages. (in fact I can't see where that message is used by any consensus).

Dan (Sat, 10 Nov 2018 01:33:23 GMT):
Ok so that's probably a red herring (gray herring for reference: :fish: ) Poet engine should probably just ignore those messages. (in fact I can't see where that message is used by any consensus).

Dan (Sat, 10 Nov 2018 15:32:57 GMT):
https://pastebin.com/eCbTPB0n

Dan (Sat, 10 Nov 2018 15:33:32 GMT):
validator-0_1 | thread 'PublisherThread' panicked at 'BlockInjector.block_start failed: PyErr { ptype: , pvalue: Some("a bytes-like object is required, not 'BlockHeader'"), ptraceback: Some() }', libcore/result.rs:1009:5

eugene-babichenko (Mon, 12 Nov 2018 09:39:50 GMT):
@amundson Yes, this is to manage a system with static peering without having to restart a node. @agunde Not sure if this permissioning should be on-chain. I think I will prepare an RFC to discuss that further.

amundson (Mon, 12 Nov 2018 15:52:22 GMT):
@eugene-babichenko sounds good

nage (Mon, 12 Nov 2018 16:55:49 GMT):
Would anyone from the Sawtooth community care to comment on this indy-hipe that changes our "rust-like rfc" process to try to make it easier? https://github.com/hyperledger/indy-hipe/pull/56/files I'm hoping we can have an emerging consensus on how to best handle these repos to ease cross-project collaborations at Hyperledger, and am looking for perspectives from outside Indy before we make substantial changes.

amundson (Tue, 13 Nov 2018 02:48:58 GMT):
@nage for the numbering, we use the PR number; kind of nice because after the PR is created you know the number immediately

amundson (Tue, 13 Nov 2018 02:52:03 GMT):
oh, I was initially missing the anti-PR piece

amundson (Tue, 13 Nov 2018 02:58:23 GMT):
@nage my initial thought is that the feedback mechanism would be worse with the proposed change

amundson (Tue, 13 Nov 2018 02:59:02 GMT):
and since the entire point is to gather feedback and respond to it, probably not worth the trade-off

arsulegai (Tue, 13 Nov 2018 03:32:38 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=h9HvGfMhdCuCvTfA8) @amundson @jsmitchell any thoughts on comment updated by @yoni ?

socoboy (Tue, 13 Nov 2018 07:29:03 GMT):
Has joined the channel.

socoboy (Tue, 13 Nov 2018 07:32:25 GMT):
Hi everyone, I'm using the latest version of Sawtooth from commit at Nov 1 - 8a366ad2 I'm having issue which I don't know root cause and how to fix it, When I sent a fail transaction (which would be rejected by Transaction Processor), the transaction is sent continuously to transaction processor, like it stucked in a loop, until another succes transaction verified

socoboy (Tue, 13 Nov 2018 07:32:37 GMT):
If anyone know how to fix it, please help me. Thank you so much

agunde (Tue, 13 Nov 2018 13:32:20 GMT):
If it is generating an InternalError it will be retried. In most cases InternalErrors should not be raised in the TransactionProcessor and should instead return an InvalidTransaction. @socoboy Are you using your own TransactionProcessor? Do you see InternalError logs?

amundson (Tue, 13 Nov 2018 14:10:17 GMT):
@arsulegai it does not cover most of the questions I had asked here

arsulegai (Tue, 13 Nov 2018 14:38:29 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=miuJGpqA4Cvb5ttk7) @amundson Hmm.. ok, thanks

nage (Tue, 13 Nov 2018 15:49:18 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=he9uJD9vuHaDH9DFY) @amundson Thanks. I'll add these comments to the PR

socoboy (Wed, 14 Nov 2018 04:55:37 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=jjjLdx3YgwNi8ZZG7) @agunde Hi @agunde , I just return an InvalidTransactionError like this ``` if createAccountData.UserPublicAddress == "" { return &processor.InvalidTransactionError{Msg: "Public address must not be empty"} } ``` Let me check on the log and let you know more detaily

socoboy (Wed, 14 Nov 2018 07:04:08 GMT):
Hi @agunde, Here are the logs on our TP: ``` sawtooth-fruitchain-tp-go-default | 2018/11/14 04:59:25.790211 handler.go:64: [DEBUG] fruitchain txn 6562a688a43e7192a046ff8a74731333e8961eac10952f67eb7bed6e307db7c90f1b4b9b3cff39b5d402158ef0d5574df359f6865230e279c66967ceefd24073: type CREATE_PERMISSION sawtooth-fruitchain-tp-go-default | 2018/11/14 04:59:25.790239 handler.go:67: [DEBUG] public key 024540fea32947158680192ab7997fd0145b011a3fc27ec56354ea77e6fe763ca1 sawtooth-fruitchain-tp-go-default | 2018/11/14 04:59:25.790258 handler.go:136: [DEBUG] fruitchain txn create permission permission_public_address:“027878bbcf9223c3701c25035b5454338dae4adce0da329740e2b59d96e9ab36cb” permission_types:BACKEND sawtooth-fruitchain-tp-go-default | 2018/11/14 04:59:25.793109 worker.go:74: [WARN] (411f561f-5dae-43f9-ae61-02f29b335417 ) Invalid transaction: User is not allowed to create permission sawtooth-fruitchain-tp-go-default | 2018/11/14 04:59:26.794135 handler.go:64: [DEBUG] fruitchain txn 6562a688a43e7192a046ff8a74731333e8961eac10952f67eb7bed6e307db7c90f1b4b9b3cff39b5d402158ef0d5574df359f6865230e279c66967ceefd24073: type CREATE_PERMISSION sawtooth-fruitchain-tp-go-default | 2018/11/14 04:59:26.794164 handler.go:67: [DEBUG] public key 024540fea32947158680192ab7997fd0145b011a3fc27ec56354ea77e6fe763ca1 sawtooth-fruitchain-tp-go-default | 2018/11/14 04:59:26.794182 handler.go:136: [DEBUG] fruitchain txn create permission permission_public_address:“027878bbcf9223c3701c25035b5454338dae4adce0da329740e2b59d96e9ab36cb” permission_types:BACKEND sawtooth-fruitchain-tp-go-default | 2018/11/14 04:59:26.797418 worker.go:74: [WARN] (387e6a09-e1aa-439d-a148-9bc4c9a45ace ) Invalid transaction: User is not allowed to create permission ``` I return an InvalidTransactionError

socoboy (Wed, 14 Nov 2018 07:04:27 GMT):
And it continuously verify the transaction

agunde (Wed, 14 Nov 2018 14:08:35 GMT):
Any clues from the validator logs?

agunde (Wed, 14 Nov 2018 14:12:23 GMT):
Also which version of sawtooth are you useng?

agunde (Wed, 14 Nov 2018 14:12:23 GMT):
Also which version of sawtooth are you using?

anoopc444 (Thu, 15 Nov 2018 07:01:58 GMT):
Has joined the channel.

anoopc444 (Thu, 15 Nov 2018 07:03:07 GMT):
Hi All, i am pretty new to the sawtooth development, Could anyone guide me the required system configuration to start the sawtooth setup and development based on your prior experience?

amolk (Thu, 15 Nov 2018 10:32:48 GMT):
@anoopc444, this is a question for the 'sawtooth' channel. You can find the info here: https://sawtooth.hyperledger.org/docs/core/releases/latest/app_developers_guide.html. No special system requirements except Ubuntu 16.04, docker, docker-compose etc.

JayeshJawale2 (Mon, 19 Nov 2018 09:11:10 GMT):
Has joined the channel.

ZorbaGrue (Fri, 23 Nov 2018 09:47:57 GMT):
Hi. I was able to start a single node sawtooth network by following the steps mentioned in the docs. Following is specified in the documentation: "Any work done in this environment will be lost once the container exits. To keep your work, you would need to take additional steps, such as mounting a host directory into the container. See the Docker documentation for more information." I looked into Docker guide and in order to use volumes to share date between container/host you have to specify the target file/directory within the container. Here is how my validator looks like: `validator: image: docker.io/hyperledger/sawtooth-validator entrypoint: "bash -c \"\ sawadm keygen && \ sawtooth keygen my_key && \ sawset genesis -k /root/.sawtooth/keys/my_key.priv && \ sawadm genesis config-genesis.batch && \ sawtooth-validator -vv \ --endpoint tcp://validator:8800 \ --bind component:tcp://eth0:4004 \ --bind network:tcp://eth0:8800 \"" ports: - "5000:4004" networks: backend: volumes: - /sawtooth-data:/var/lib/sawtooth` After container starts /sawtooth-data populates with files, such as: `-rw-r--r-- 1 root root 1099511627776 Nov 23 10:18 block-00.lmdb -rw-r--r-- 1 root root 8192 Nov 23 10:18 block-00.lmdb-lock -rw-r--r-- 1 root root 128 Nov 23 10:18 block-chain-id -rw-r--r-- 1 root root 1099511627776 Nov 23 10:18 merkle-00.lmdb -rw-r--r-- 1 root root 8192 Nov 23 10:18 merkle-00.lmdb-lock -rw-r--r-- 1 root root 1099511627776 Nov 23 10:18 txn_receipts-00.lmdb -rw-r--r-- 1 root root 8192 Nov 23 10:18 txn_receipts-00.lmdb-lock +` At the moment, when container stops and starts again data is not persisted even tough valume is mapped as suggested. Can you please provie your thoughts regarding this? Maybe target /var/lib/sawtooth is not the one which needs to be mapped, other settings may be required, etc. Thank you.

adamgering (Sun, 25 Nov 2018 23:58:17 GMT):
There's an error on line 100 of python/sawtooth_sdk/processor/context.py addresses = [e.address for e in state_entries] there is no address property, address is the key value of the state_entries dictionary Also, it wouldn't tell you the specific address you weren't authorized to write to, although that's let of an issue than returning the correct error message: raise AuthorizationException( 'Tried to set unauthorized address: {}'.format(addresses)) Should I open a ticket on JIRA?

adamgering (Sun, 25 Nov 2018 23:58:17 GMT):
There's an error on line 100 of python/sawtooth_sdk/processor/context.py addresses = [e.address for e in state_entries] there is no address attribute, address is the key value of the state_entries dictionary Also, it wouldn't tell you the specific address you weren't authorized to write to, although that's let of an issue than returning the correct error message: raise AuthorizationException( 'Tried to set unauthorized address: {}'.format(addresses)) Should I open a ticket on JIRA?

adamgering (Sun, 25 Nov 2018 23:58:17 GMT):
There's an error on line 100 of python/sawtooth_sdk/processor/context.py addresses = [e.address for e in state_entries] there is no address attribute, address is the key value of the state_entries dictionary Also, it wouldn't tell you the specific address you weren't authorized to write to, although that's less of an issue than returning the correct error message: raise AuthorizationException( 'Tried to set unauthorized address: {}'.format(addresses)) Should I open a ticket on JIRA?

adamgering (Mon, 26 Nov 2018 00:06:32 GMT):
Presently it returns <>

bochuxt (Mon, 26 Nov 2018 07:11:28 GMT):
Has joined the channel.

nishanthkp (Mon, 26 Nov 2018 10:14:12 GMT):
Has joined the channel.

amundson (Mon, 26 Nov 2018 18:21:09 GMT):
@adamgering JIRA and/or PR welcome

aedigix (Tue, 27 Nov 2018 04:53:14 GMT):
Has joined the channel.

kdenhartog (Tue, 27 Nov 2018 19:52:24 GMT):
Has left the channel.

agoops (Wed, 28 Nov 2018 23:16:35 GMT):
Has joined the channel.

eugene-babichenko (Fri, 30 Nov 2018 14:03:44 GMT):
Hi, just curious about how does the RFC process usually take?

agunde (Fri, 30 Nov 2018 14:10:29 GMT):
@pschwarz @amundson @Dan @jsmitchell @adamludvik Can we get some reviews on @eugene-babichenko rfc pr https://github.com/hyperledger/sawtooth-rfcs/pull/32

Dan (Fri, 30 Nov 2018 14:14:38 GMT):
@kodonnel you may also be interested in ^

kelly_ (Tue, 04 Dec 2018 17:24:07 GMT):
@amundson do we have an expected path for the sawtooth 1.1 release notes? would like to link to them in the blog

kelly_ (Tue, 04 Dec 2018 17:31:39 GMT):
oh nevermind i see the pull request

kelly_ (Tue, 04 Dec 2018 17:31:55 GMT):
@adamludvik - do you know what the URL will be when the bumper notes request gets merged on sawtooth-website

kelly_ (Tue, 04 Dec 2018 17:33:17 GMT):
looks liek https://sawtooth.hyperledger.org/release/bumper/ maybe?

adamludvik (Tue, 04 Dec 2018 17:33:57 GMT):
one second while I rewind my brain

kelly_ (Tue, 04 Dec 2018 17:34:38 GMT):
or https://sawtooth.hyperledger.org/release/bumper/notes.html

kelly_ (Tue, 04 Dec 2018 17:34:41 GMT):
ok thanks :)

adamludvik (Tue, 04 Dec 2018 17:39:49 GMT):
I believe https://sawtooth.hyperledger.org/release/bumper/notes.html should work, the permalink specified is https://sawtooth.hyperledger.org/release/bumper/. Trying to verify this by rebuilding the website locally.

adamludvik (Tue, 04 Dec 2018 17:40:23 GMT):
Okay, verified locally that you want https://sawtooth.hyperledger.org/release/bumper/

kelly_ (Tue, 04 Dec 2018 17:40:32 GMT):
awesome, thanks adam!

adamludvik (Tue, 04 Dec 2018 17:40:56 GMT):
There will also be a new "Releases" tab in the banner at the top of the site.

kelly_ (Thu, 06 Dec 2018 18:55:36 GMT):
@amundson - i assume you've seen this: https://blog.rust-lang.org/2018/12/06/Rust-1.31-and-rust-2018.html

kthblmfld (Sun, 09 Dec 2018 02:40:24 GMT):
Has joined the channel.

paul.sitoh (Sun, 09 Dec 2018 13:31:08 GMT):
Folks, I was examining the core code and I noticed that there are folders `sdk`, `protos`, `processor`. I am considering buidling TP in Go but having seen the sawtooth-sdk-go, I notice that there are also `protos` and `processor` folders. It seemed somewhat confusing as to which to us. The stuff in the core code or the independent sdk repos? Also I presumed protos is meant to be the shared across multiple SDKs. I was wondering won't it better for it to be only be in core and not copied across multiple repos?

Dan (Sun, 09 Dec 2018 16:18:48 GMT):
Yeah having a single repo for protos would have been another way to go. The SDKs that remain in core are those necessary for core. The other languages were all split out. So if you want to work with go you can work exclusively with the go repo.

JSSilva (Mon, 10 Dec 2018 14:43:29 GMT):
Has joined the channel.

DatNguyen (Tue, 11 Dec 2018 07:22:05 GMT):
Has joined the channel.

arsulegai (Sat, 15 Dec 2018 04:09:39 GMT):
Thought: How about moving consensus SDK related code out of sawtooth-core/sdk/rust or sawtooth-core/sdk/python to something like sawtooth-core/consensus/sdk/rust. Also move sawtooth-core/sdk/examples/devmode_rust to sawtooth-core/consensus/examples/devmode_rust?

rbuysse (Sat, 15 Dec 2018 20:02:13 GMT):
there have been suggestions to move the SDKs out of the core repo altogether

arsulegai (Sun, 16 Dec 2018 15:19:21 GMT):
Oh, yeah! Moving rust and python SDK out of sawtooth-core is good option too

amundson (Mon, 17 Dec 2018 00:36:47 GMT):
@arsulegai curious about the motivation. how does this help you? splitting the consensus sdk out of the primary sdk would be possible but it seems fine to have them combined too.

arsulegai (Mon, 17 Dec 2018 19:25:36 GMT):
@amundson no problems or no hard thoughts on current structure, but it's just these following things made me think there's scope for improvement 1. Give logical partition for application developers from consensus development (more of framework side) 2. Having both SDK in same cargo project ties their version. In most cases changes in consensus part may not require us to upgrade application development SDK. 3. It's rare that one uses modules from both consensus and application SDK.

amundson (Mon, 17 Dec 2018 20:44:18 GMT):
@arsulegai I agree with all those points. the counter-point is its more work to maintain another SDK, which is why it is were it is currently, and there are few (if any) technical downsides to it being in one SDK.

amundson (Mon, 17 Dec 2018 20:44:35 GMT):
there are quite a few negatives for the sdk to remain in the core repo though

Eddiwar (Mon, 17 Dec 2018 23:56:09 GMT):
Has joined the channel.

Dan (Wed, 19 Dec 2018 15:01:28 GMT):
FYI @arsulegai @manojgop and rajeev (sorry I can't find his handle here) reported validator crashes while testing PoET2. It's unclear whether it's instability in validator master, the rust consensus SDK, or the poet2 code. From the sound of it I would guess it's related to the validator. The crash happens after thousands of blocks somewhere in the scheduler after it crosses an FFI boundary. There's logs and more details to be shared here. My info is ~1 day old at this point so I'll rely on Arun et al to update here. Is anyone running LR on master with a rust consensus?

manojgop (Wed, 19 Dec 2018 15:01:28 GMT):
Has joined the channel.

arsulegai (Wed, 19 Dec 2018 18:58:27 GMT):
^ @rranjan3

rranjan3 (Wed, 19 Dec 2018 18:58:27 GMT):
Has joined the channel.

pschwarz (Wed, 19 Dec 2018 21:29:45 GMT):
Is PoET2 examining past state at all?

pschwarz (Wed, 19 Dec 2018 21:30:42 GMT):
I'm just wondering if its in a situation where it's walking back too far relative to state pruning

pschwarz (Wed, 19 Dec 2018 21:30:42 GMT):
I'm just wondering if it's in a situation where it's walking back too far relative to state pruning

rranjan3 (Thu, 20 Dec 2018 13:09:33 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=Jr6cydrmdNWbjYy2B) @pschwarz No @pschwarz this is not being examined in PoET2

rranjan3 (Thu, 20 Dec 2018 13:09:33 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=BEt7pf8rDxf6z58P3) @pschwarz [ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=Jr6cydrmdNWbjYy2B) @pschwarz No this is not being examined in PoET2

rranjan3 (Thu, 20 Dec 2018 13:09:33 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=Jr6cydrmdNWbjYy2B) No this is not being examined in PoET2

rranjan3 (Thu, 20 Dec 2018 13:38:27 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=BEt7pf8rDxf6z58P3) @pschwarz This is possible but then there are no errors in the validator logs(debug enabled) in this line. Another thing is that this issue has been hit for a run with low number of blocks as well(~200 ) which is less than the default *state_pruning_block_depth* of 1000.

rranjan3 (Thu, 20 Dec 2018 13:38:27 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=BEt7pf8rDxf6z58P3) @pschwarz This is possible but then there are no errors in the validator logs(debug enabled) in this line. Another thing is that this issue has been hit for a run with low number of blocks as well(~200 ) which is less than the default *state_pruning_block_depth* of 1000. But yes at all other times, the number of committed blocks in the chain are to the tune of few thousands(3k-15k).

jsmitchell (Thu, 20 Dec 2018 15:31:07 GMT):
what's the error?

rranjan3 (Thu, 20 Dec 2018 15:55:31 GMT):
No error is seen in the validator logs. The docker container would get killed after printing the last log as - [23:58:40.246 [Dummy-30] (unknown file) DEBUG] [src/journal/block_scheduler.rs: 166] Adding block 7c2aaabe0875d65e090ccdaa8bdfdd67646bcfc5a4cc193ea5af8fddaabb9579105cc60e9880f31ca0f1af85a85d54a2b317bdcd80118716aa6d3940abcdd1e3 for processing

rranjan3 (Thu, 20 Dec 2018 15:56:10 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=bmYv4gSupcm3GQcwd) @jsmitchell No error is seen in the validator logs. The docker container would get killed after printing the last log as - _[23:58:40.246 [Dummy-30] (unknown file) DEBUG] [src/journal/block_scheduler.rs: 166] Adding block 7c2aaabe0875d65e090ccdaa8bdfdd67646bcfc5a4cc193ea5af8fddaabb9579105cc60e9880f31ca0f1af85a85d54a2b317bdcd80118716aa6d3940abcdd1e3 for processing_

rranjan3 (Thu, 20 Dec 2018 16:07:25 GMT):
Tried adding custom logs to drill down. The last traces then could be seen was in state validation - https://github.com/hyperledger/sawtooth-core/blob/master/validator/src/scheduler/py_scheduler.rs#L67

jsmitchell (Thu, 20 Dec 2018 16:07:26 GMT):
is the validator dumping core? have you taken a look at the backtraces in the core file?

rranjan3 (Thu, 20 Dec 2018 17:12:48 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=WmJZh3HCYYKJDLhLh) @jsmitchell There are other one-off issues where I have seen it run into some Segmentation fault and then dump core but not yet in this case.

jsmitchell (Thu, 20 Dec 2018 17:24:45 GMT):
well, there is something happening to cause the validator process to exit

rranjan3 (Thu, 20 Dec 2018 17:48:04 GMT):
Another observation worth sharing is that with default input rates( for intkey workload it is 1 tps), setups have survived for pretty long. Over 60k blocks is what I have observed in this case for a network running for beyond 36 hours only to fail with this same issue.

rranjan3 (Thu, 20 Dec 2018 17:48:04 GMT):
Another observation worth sharing is that with default input rates( for intkey workload it is 1 tps), setups have survived for pretty long. Over 60k blocks is what I have observed in this case for a network running for beyond 36 hours only to fail with this same issue.

rranjan3 (Thu, 20 Dec 2018 17:48:04 GMT):
Another observation worth sharing is that with default input rates( for intkey workload it is 1 tps), setups have survived for pretty long. Over 60k blocks is what I have observed in this case for a network running for beyond 36 hours only to fail with this same issue. We are able to pull down this time to 5-10 hours or 3-15k blocks by increasing the input rates(just as much that backpressure is not hit). Haven't been profiling system resources but random checks haven't suggested any issue from that perspective.

merq (Sat, 22 Dec 2018 03:39:41 GMT):
Has joined the channel.

LeonardoCarvalho (Wed, 26 Dec 2018 10:28:12 GMT):
Hi guys, did the commit about the [ZMQ Probe|https://github.com/hyperledger/sawtooth-core/commit/0d43a5079904909086148c53201459117c7b4829] made it to the v1.1 branch ? Looks like not.

pschwarz (Wed, 26 Dec 2018 18:41:52 GMT):
Can you add a link to the original PR for that? We can use that as reference for discussing adding it to the backport list

LeonardoCarvalho (Sat, 29 Dec 2018 17:08:09 GMT):
Sure, here it is

LeonardoCarvalho (Sat, 29 Dec 2018 17:08:10 GMT):
https://github.com/hyperledger/sawtooth-core/pull/1858

LeonardoCarvalho (Sat, 29 Dec 2018 17:08:27 GMT):
And I did find an interesting site, https://safecurves.cr.yp.to/

LeonardoCarvalho (Sat, 29 Dec 2018 17:09:33 GMT):
Where it claims that secp256k1 is vulnerable to some widely known attacks, what do you all think ?

rstrauck (Fri, 04 Jan 2019 16:35:36 GMT):
Has joined the channel.

manojgop (Tue, 08 Jan 2019 14:25:35 GMT):
I ran sawooth raft on a 3 node network. After running for 3-4 hours I get below "NO respose" message in one of the validator nodes and it removes the connection as per the below logs. I guess this is ZMQ connection loss. Any idea why this happens

manojgop (Tue, 08 Jan 2019 14:25:35 GMT):
I ran sawooth raft on a 3 node network. After running for 3-4 hours I get below "NO respose" message in one of the validator nodes and it removes the connection as per the below logs. I guess this is ZMQ connection loss. Any idea why this happens {"log":"\u001b[32m[2019-01-07 17:46:19.733 INFO interconnect]\u001b[0m \u001b[37mNo response from c47c22595d0785568099d6e58bd263131ef3f7e9c5e8f0a8536c115f5aaedd2c28a0bb5efcbaed61fc6168116a3ef0ac758f67e59960556cca08dd6a9d50e9ca in 582.5916867256165 seconds - removing connection.\u001b[0m\n","stream":"stderr","time":"2019-01-07T17:46:19.791210358Z"} {"log":"\u001b[32m[2019-01-07 17:46:19.828 INFO interconnect]\u001b[0m \u001b[37mNo response from OutboundConnectionThread-tcp://5b09efec6fbc:8800 in 1546882597.113996 seconds - removing connection.\u001b[0m\n","stream":"stderr","time":"2019-01-07T17:46:19.868576709Z"} {"log":"\u001b[36m[2019-01-07 17:46:19.917 DEBUG dispatch]\u001b[0m \u001b[37mRemoved send_message function for connection OutboundConnectionThread-tcp://5b09efec6fbc:8800\u001b[0m\n","stream":"stderr","time":"2019-01-07T17:46:19.918195517Z"} {"log":"\u001b[36m[2019-01-07 17:46:20.720 DEBUG gossip]\u001b[0m \u001b[37mEndpoint has not completed authorization in 10 seconds: tcp://5b09efec6fbc:8800\u001b[0m\n","stream":"stderr","time":"2019-01-07T17:46:20.730162981Z"}

manojgop (Tue, 08 Jan 2019 14:25:35 GMT):
I ran sawooth raft on a 3 node network. After running for 3-4 hours I get below "No response" message in one of the validator nodes and it removes the connection as per the below logs. I guess this is ZMQ connection loss. Any idea why this happens ---> "{"log":"\u001b[32m[2019-01-07 17:46:19.733 INFO interconnect]\u001b[0m \u001b[37mNo response from c47c22595d0785568099d6e58bd263131ef3f7e9c5e8f0a8536c115f5aaedd2c28a0bb5efcbaed61fc6168116a3ef0ac758f67e59960556cca08dd6a9d50e9ca in 582.5916867256165 seconds - removing connection.\u001b[0m\n","stream":"stderr","time":"2019-01-07T17:46:19.791210358Z"} {"log":"\u001b[32m[2019-01-07 17:46:19.828 INFO interconnect]\u001b[0m \u001b[37mNo response from OutboundConnectionThread-tcp://5b09efec6fbc:8800 in 1546882597.113996 seconds - removing connection.\u001b[0m\n","stream":"stderr","time":"2019-01-07T17:46:19.868576709Z"} {"log":"\u001b[36m[2019-01-07 17:46:19.917 DEBUG dispatch]\u001b[0m \u001b[37mRemoved send_message function for connection OutboundConnectionThread-tcp://5b09efec6fbc:8800\u001b[0m\n","stream":"stderr","time":"2019-01-07T17:46:19.918195517Z"} {"log":"\u001b[36m[2019-01-07 17:46:20.720 DEBUG gossip]\u001b[0m \u001b[37mEndpoint has not completed authorization in 10 seconds: tcp://5b09efec6fbc:8800\u001b[0m\n","stream":"stderr","time":"2019-01-07T17:46:20.730162981Z"}

manojgop (Tue, 08 Jan 2019 14:25:35 GMT):
I ran sawooth raft on a 3 node network in docker mode. After running for 3-4 hours I get below "No response" message in one of the validator nodes and it removes the connection as per the below logs. I guess this is ZMQ connection loss. are still Any idea why this happens ---> "{"log":"\u001b[32m[2019-01-07 17:46:19.733 INFO interconnect]\u001b[0m \u001b[37mNo response from c47c22595d0785568099d6e58bd263131ef3f7e9c5e8f0a8536c115f5aaedd2c28a0bb5efcbaed61fc6168116a3ef0ac758f67e59960556cca08dd6a9d50e9ca in 582.5916867256165 seconds - removing connection.\u001b[0m\n","stream":"stderr","time":"2019-01-07T17:46:19.791210358Z"} {"log":"\u001b[32m[2019-01-07 17:46:19.828 INFO interconnect]\u001b[0m \u001b[37mNo response from OutboundConnectionThread-tcp://5b09efec6fbc:8800 in 1546882597.113996 seconds - removing connection.\u001b[0m\n","stream":"stderr","time":"2019-01-07T17:46:19.868576709Z"} {"log":"\u001b[36m[2019-01-07 17:46:19.917 DEBUG dispatch]\u001b[0m \u001b[37mRemoved send_message function for connection OutboundConnectionThread-tcp://5b09efec6fbc:8800\u001b[0m\n","stream":"stderr","time":"2019-01-07T17:46:19.918195517Z"} {"log":"\u001b[36m[2019-01-07 17:46:20.720 DEBUG gossip]\u001b[0m \u001b[37mEndpoint has not completed authorization in 10 seconds: tcp://5b09efec6fbc:8800\u001b[0m\n","stream":"stderr","time":"2019-01-07T17:46:20.730162981Z"}

ltseeley (Tue, 08 Jan 2019 17:56:19 GMT):
@manojgop have you checked the Raft logs to see if there was a failure there?

danintel (Thu, 10 Jan 2019 00:34:21 GMT):
*Editorial comment:* I strongly recommend not changing CLI names after release. For example, `devmode-engine-rust` in release 1.1.x (latest, bumper) is `devmode-rust` in release 1.2.x (nightly)

danintel (Thu, 10 Jan 2019 00:34:21 GMT):
*Editorial comment:* I strongly recommend not changing CLI names after release. For example, `devmode-engine-rust` in release 1.1.x (latest, bumper) is `devmode-rust` in release 1.2.x (nightly) P.S., I could use a documentation review, PR #1997 https://github.com/hyperledger/sawtooth-core/pull/1997

manojgop (Fri, 11 Jan 2019 08:47:00 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=wFjjepzdNbPJa3N9f) @ltseeley @ltseeley I debugged it further and found that after couple of hours the hearbeat message from Leader takes longer time (4sec +) to reach the follower. By default, heartbeat is set as 200ms and election time out as 2 sec. But since heartbeat from leader took long time to reach the follower (reason is unkown), the follower times out and starts new election process. And once this happens the process repeats again frequently and leader election will not be stable and during this period no one is publishing the block and we get queue full error. As I understand the consensus engine do not send messages to its peer directly and the communication is routed via validator ZMQ. And all communication between validators (blocks, messages form consensus engine etc) happen via single network interconnect between validators. So looks like there is some delay in communication or processing at validator end. The ZmqService::send_to() function in consensus SDK is sometime taking long time to send the heart beat. Looks like the sync_channel() used for communication is betting blocked (may be due to buffer full and validator slow in processing messages ??). Let me know if you have any thoughts on why the messages send from one consensus engine is getting delayed.

manojgop (Fri, 11 Jan 2019 08:47:00 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=wFjjepzdNbPJa3N9f) @ltseeley @ltseeley I debugged raft issue further and found that after couple of hours the hearbeat message from Leader takes longer time (4sec +) to reach the follower. By default, heartbeat is set as 200ms and election time out as 2 sec. But since heartbeat from leader took long time to reach the follower (reason is unkown), the follower times out and starts new election process. And once this happens the process repeats again frequently and leader election will not be stable and during this period no one is publishing the block and we get queue full error. As I understand the consensus engine do not send messages to its peer directly and the communication is routed via validator ZMQ. And all communication between validators (blocks, messages form consensus engine etc) happen via single network interconnect between validators. So looks like there is some delay in communication or processing at validator end. The ZmqService::send_to() function in consensus SDK is sometime taking long time to send the heart beat. Looks like the sync_channel() used for communication is betting blocked (may be due to buffer full and validator slow in processing messages ??). Let me know if you have any thoughts on why the messages send from one consensus engine is getting delayed.

manojgop (Fri, 11 Jan 2019 08:47:00 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=wFjjepzdNbPJa3N9f) @ltseeley I debugged raft issue further and found that after couple of hours the hearbeat message from Leader takes longer time (4sec +) to reach the follower. By default, heartbeat is set as 200ms and election time out as 2 sec. But since heartbeat from leader took long time to reach the follower (reason is unkown), the follower times out and starts new election process. And once this happens the process repeats again frequently and leader election will not be stable and during this period no one is publishing the block and we get queue full error. As I understand the consensus engine do not send messages to its peer directly and the communication is routed via validator ZMQ. And all communication between validators (blocks, messages from consensus engine etc) happen via single network interconnect between validators. So looks like there is some delay in communication or processing at validator end. The ZmqService::send_to() function in consensus SDK is sometime taking long time to send the message (heart beat msg in case of Raft). Looks like the sync_channel() used for communication is betting blocked (may be due to buffer full and validator slow in processing messages ??). Let me know if you have any thoughts on why the messages send from consensus engine is getting delayed to reach peer consensus engine.

manojgop (Fri, 11 Jan 2019 09:19:49 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=wFjjepzdNbPJa3N9f) @ltseeley I debugged raft issue further and found that after couple of hours the hearbeat message from Leader takes longer time (4sec +) to reach the follower. By default, heartbeat is set as 200ms and election time out as 2 sec. But since heartbeat from leader took long time to reach the follower (reason is unkown), the follower times out and starts new election process. And once this happens the process repeats again frequently and leader election will not be stable and during this period no one is publishing the block and we get queue full error. As I understand the consensus engine do not send messages to its peer directly and the communication is routed via validator ZMQ. And all communication between validators (blocks, messages from consensus engine etc) happen via single network interconnect between validators. So looks like there is some delay in communication or processing at validator end. The ZmqService::send_to() function in consensus SDK is sometime taking long time to send the message (heart beat msg in case of Raft). Looks like the zmq socket used for communication (in zmq_stream.rs) is betting blocked for some reason . Let me know if you have any thoughts on why the messages send from consensus engine is getting delayed to reach peer consensus engine.

manojgop (Fri, 11 Jan 2019 09:19:49 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=wFjjepzdNbPJa3N9f) @ltseeley I debugged raft issue further and found that after couple of hours the *hearbeat message from Leader takes longer time (4sec +) to reach the follower*. By default, heartbeat is set as 200ms and election time out as 2 sec. But since heartbeat from leader took long time to reach the follower (reason is unkown), the follower times out and starts new election process. And once this happens the process repeats again frequently and leader election will not be stable and *during this period no one is publishing the block and we get queue full error*. As I understand the consensus engine do not send messages to its peer directly and the communication is routed via validator ZMQ. And all communication between validators (blocks, messages from consensus engine etc) happen via single network interconnect between validators. So looks like there is some delay in communication . *The ZmqService::send_to() function in consensus SDK is sometime taking long time to send the message (heart beat msg in case of Raft)*. Looks like the zmq socket used for communication (in zmq_stream.rs) is getting blocked for some reason . Let me know if you have any thoughts on why the messages send from consensus engine is getting delayed to reach peer consensus engine.

muniyaraj (Fri, 11 Jan 2019 11:53:47 GMT):
Has joined the channel.

amundson (Fri, 11 Jan 2019 18:28:27 GMT):
@danintel the intent are that they both be devmode-engine-rust. so either there is an issue in master or you have something cached? Should be pretty easy to fix if it's wrong.

danintel (Fri, 11 Jan 2019 18:35:19 GMT):
@amundson That is good to hear--so it's just a temporary issue I can workaround. I have Sawtooth nightly 1.2.1.dev171 on Xenial, and apparently nightly packages are no longer produced for Xenial. I'd prefer to use Bionic, but I have never been able to install Sawtooth on it (due to the missing PoET packages).

amundson (Fri, 11 Jan 2019 18:46:15 GMT):
@danintel Our (Bitwise) intent is to focus on bionic moving forward. @rbuysse put in a change which makes it possible to support multiple distributions but it would stretch testing resources thin to support both. (If others want to step up there, great.) As far as the PoET packages, PoET is not stable enough on 1.2+bionic currently to pass PR checks and thus the patches @rbuysse has for producing those packages haven't been merged. Definitely an opportunity for someone to step up and debug that issue.

danintel (Fri, 11 Jan 2019 19:18:50 GMT):
I don't expect backporting nightly to Xenial, but it would be nice to be able to at least install Sawtooth packages on Bionic. Currently, there is a dependency of Sawtooth packages on 5 `python3-sawtooth-poet-*` packages. Could this dependency be removed temporarily? Here is the install error on Bionic: ```$ sudo apt-get install -y sawtooth . . . Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation: The following packages have unmet dependencies: sawtooth : Depends: python3-sawtooth-poet-cli but it is not installable . . .```

rbuysse (Fri, 11 Jan 2019 19:29:59 GMT):
that's just for the meta-package

rbuysse (Fri, 11 Jan 2019 19:30:09 GMT):
you should be able to install individual packages

danintel (Fri, 11 Jan 2019 20:15:48 GMT):
Thanks--so package `sawtooth` is not needed. This subset seems to work OK: `sudo apt-get install python3-sawtooth-cli python3-sawtooth-integration python3-sawtooth-rest-api python3-sawtooth-sdk python3-sawtooth-settings python3-sawtooth-signing python3-sawtooth-validator sawtooth-devmode-engine-rust`

nanspro (Sat, 12 Jan 2019 10:08:26 GMT):
Has joined the channel.

muniyaraj (Sun, 13 Jan 2019 15:57:49 GMT):
Hi all, I getting Registration of [chair 1.0] failed

muniyaraj (Sun, 13 Jan 2019 15:57:54 GMT):
pls help me

ltseeley (Mon, 14 Jan 2019 15:13:50 GMT):
@manojgop it could be that the network is just backed up at that point; we have seen that happen under a very high workload.

manojgop (Mon, 14 Jan 2019 15:42:11 GMT):
@ltseeley This high network load is also impacting the heatbeat messages between validators and I can see "No response from peer validator" in the logs and validators eventually removing the connection with the peers. I'll reduce the hearbeat messages between raft consensus engine (which is also going through validator network interconnect) and check the behaviour. By default raft is sending heart beat messages every 200ms. I'll increase it to 2 sec and check

kelly_ (Tue, 15 Jan 2019 00:32:51 GMT):
if anyone has any updates they'd like to get into the next Hyperledger Sawtooth TSC, please @ me and let me know! https://wiki.hyperledger.org/groups/tsc/project-updates/sawtooth-2019-jan

kelly_ (Tue, 15 Jan 2019 00:32:52 GMT):
thanks!

renex (Thu, 17 Jan 2019 15:24:42 GMT):
Has joined the channel.

abgomez (Fri, 18 Jan 2019 22:09:42 GMT):
Has joined the channel.

youngerjo (Tue, 22 Jan 2019 04:09:21 GMT):
Has joined the channel.

rbnaraujo (Fri, 25 Jan 2019 12:23:46 GMT):
Has joined the channel.

Behzad 2 (Sat, 26 Jan 2019 20:40:14 GMT):
Has joined the channel.

moulika (Mon, 28 Jan 2019 11:04:14 GMT):
Has joined the channel.

manojgop (Mon, 28 Jan 2019 17:22:47 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=9W7gdSbPqetevrBYP) I tried increasing the Raft heartbeat to 4 sec. But I still observer the same behavior after running for about 30+ hours. If we increase the TPS rate to 50 or 100 then we can observe it sooner. So looks like lot of messages are getting queued to be processed in the validator (which includes both consensus engine and validator gossip peer messages). I also tried using a very light weight TP which just returns success from apply method in TP handler without doing any real processing just to rule out any TP bottleneck in processing.``` Have you tested and profiled validator with higher TPS or run Raft for longer duration? Any validator issues which are already known for these scenarios ? ```

manojgop (Mon, 28 Jan 2019 17:22:47 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=9W7gdSbPqetevrBYP) @ltseeley @amundson I tried increasing the Raft heartbeat to 4 sec. But I still observer the same behavior after running for about 30+ hours. If we increase the TPS rate to 50 or 100 then we can observe it sooner. So looks like lot of messages are getting queued to be processed in the validator (which includes both consensus engine and validator gossip peer messages). I also tried using a very light weight TP which just returns success from apply method in TP handler without doing any real processing just to rule out any TP bottleneck in processing.``` Have you tested and profiled validator with higher TPS or run Raft for longer duration? Any validator issues which are already known for these scenarios ? ```

manojgop (Mon, 28 Jan 2019 17:22:47 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=9W7gdSbPqetevrBYP) @ltseeley @amundson I tried increasing the Raft heartbeat to 4 sec. But I still observe the same behavior after running for about 30+ hours. If we increase the TPS rate to 50 or 100 then we can observe the issue sooner. So looks like lot of messages are getting queued to be processed in the validator (which includes both consensus engine and validator gossip peer messages). I also tried using a very light weight TP which just returns success from apply method in TP handler without doing any real processing just to rule out any TP bottleneck in processing.``` Have you tested and profiled validator with higher TPS or run Raft for longer duration? Any validator issues which are already known for these scenarios ? ```

jsmitchell (Mon, 28 Jan 2019 17:24:49 GMT):
@manojgop interesting. Have you characterized what types of messages predominate?

jsmitchell (Mon, 28 Jan 2019 17:25:11 GMT):
if you are running the metrics stuff, message type is one of the tags collected for the counters

manojgop (Mon, 28 Jan 2019 17:37:10 GMT):
I've not enabled metrics yet. Is grafana metrics supported if run the nodes in docker mode ? After running for long hours, from Raft engine side I've seen the block commit message response from Validator taking long time after Raft calls block_commit(). Also Raft follower nodes takes long time to get Heartbeat request from leader and then there will be reelections multiple times where there is no leader getting elected as message processing is getting slower at validator end. Also I notice lot of Missing batch request from follower nodes which means the gossip message also takes lot of time to send batches to peers. So once the message processing in validator gets slower (Dispatch incoming queue size in validator also logs > 10/20 messages consistently), things go haywire.

manojgop (Mon, 28 Jan 2019 17:37:10 GMT):
@jsmitchell I've not enabled metrics yet. Is grafana metrics supported if run the nodes in docker mode ? After running for long hours, from Raft engine side I've seen the block commit message response from Validator taking long time after Raft calls block_commit(). Also Raft follower nodes takes long time to get Heartbeat request from leader and then there will be reelections multiple times where there is no leader getting elected as message processing is getting slower at validator end. Also I notice lot of Missing batch request from follower nodes which means the gossip message also takes lot of time to send batches to peers. So once the message processing in validator gets slower (Dispatch incoming queue size in validator also logs > 10/20 messages consistently), things go haywire.

manojgop (Mon, 28 Jan 2019 17:37:10 GMT):
@jsmitchell I've not enabled metrics yet. Is grafana metrics supported if we run the validator in docker mode ? After running for long hours, from Raft engine side I've seen the block commit message response from Validator taking long time after Raft calls block_commit(). Also Raft follower nodes takes long time to get Heartbeat request from leader and then there will be reelections multiple times where there is no leader getting elected as message processing is getting slower at validator end. Also I notice lot of Missing batch request from follower nodes which means the gossip message also takes lot of time to send batches to peers. So once the message processing in validator gets slower (Dispatch incoming queue size in validator also logs > 10/20 messages consistently), things go haywire.

jsmitchell (Mon, 28 Jan 2019 17:38:57 GMT):
yes, you can deliver metrics to influxdb/grafana from a validator running inside docker

jsmitchell (Mon, 28 Jan 2019 17:39:12 GMT):
there should be an example compose file with the relevant containers and the command line args required

jsmitchell (Mon, 28 Jan 2019 17:41:14 GMT):
we might get some clues if we see an unusual type of message dominate

jsmitchell (Mon, 28 Jan 2019 17:41:55 GMT):
for example, we've seen problems in the past with the GET_BATCH_BY_TRANSACTION_ID message for dependency resolution

manojgop (Mon, 28 Jan 2019 17:43:00 GMT):
I'll enable metrics and run it again

manojgop (Mon, 28 Jan 2019 17:48:59 GMT):
But in Sawtooth Raft, a leader can commit a block without ensuring the batches in the blocks have reached the follower. We are only storing block id in the Raft log entry. I guess the assumption here is follower will receive the batches in the block via gossip network from its peers before leader commits the block. After running raft for more than 24 hrs I've seen the follower nodes sending too many batch requests and leader node responding to batch request

manojgop (Mon, 28 Jan 2019 17:50:13 GMT):
In sawtooth are all the messages from the peer validator(gossip messages) and consensus engine messages processed using same queue ?

jsmitchell (Mon, 28 Jan 2019 17:50:26 GMT):
two separate queues, iirc

manojgop (Mon, 28 Jan 2019 17:51:04 GMT):
But the consensus messages are also going via Validator ZMQ right ?

jsmitchell (Mon, 28 Jan 2019 17:51:34 GMT):
yeah, but it's a separate instance running on a different thread and managed by a different threadpool

manojgop (Mon, 28 Jan 2019 17:51:56 GMT):
Leader consensus <-> Leader validator <-> Follower Validator <-> Follower consensus where <-> = ZMQ

jsmitchell (Mon, 28 Jan 2019 17:52:27 GMT):
oh, yes - the peer messages are always exchanged via gossip

jsmitchell (Mon, 28 Jan 2019 17:52:33 GMT):
even if they are consensus messages

jsmitchell (Mon, 28 Jan 2019 17:52:47 GMT):
i thought you were referring to the engine<->validator messages

manojgop (Mon, 28 Jan 2019 17:53:53 GMT):
"Dispatch incoming queue size" in the validator logs corresponds to all the messages which dispatch handler eventually dispatches to different threadpool ?

jsmitchell (Mon, 28 Jan 2019 17:54:32 GMT):
yes, the depth of the incoming message queue

manojgop (Mon, 28 Jan 2019 17:55:01 GMT):
I meant all the peer messaging between consensus engines also go via validator ZMQ which is the same path used by gossip messages too

jsmitchell (Mon, 28 Jan 2019 17:55:10 GMT):
yes, correct

manojgop (Mon, 28 Jan 2019 17:56:11 GMT):
So consenus engine messages like heart beat may get slowed down if there is lot of gossip messages and vice versa. We can't assign any priority for messages or is it FIFO based ?

manojgop (Mon, 28 Jan 2019 17:56:11 GMT):
So consenus engine messages like heart beat may get slowed down if there are lot of gossip messages and vice versa. We can't assign any priority for messages or is it FIFO based ?

jsmitchell (Mon, 28 Jan 2019 17:57:18 GMT):
there is a mechanism for priority

jsmitchell (Mon, 28 Jan 2019 17:57:26 GMT):
it's been a long long time since I looked at that code

manojgop (Mon, 28 Jan 2019 17:59:10 GMT):
Even the commit_block() message from consensus to validator may get slowed. I've seen validator sometimes taking lot of time responding to commit_block(). I don't see any option to set message priority using consensus SDK

jsmitchell (Mon, 28 Jan 2019 18:02:41 GMT):
https://github.com/hyperledger/sawtooth-core/blob/master/validator/sawtooth_validator/networking/dispatch.py#L145

jsmitchell (Mon, 28 Jan 2019 18:05:50 GMT):
https://github.com/hyperledger/sawtooth-core/blob/master/validator/sawtooth_validator/server/consensus_handlers.py

jsmitchell (Mon, 28 Jan 2019 18:07:35 GMT):
those add_handler calls look like they are defaulting to priority=None

jsmitchell (Mon, 28 Jan 2019 18:08:55 GMT):
@ltseeley @pschwarz ^

jsmitchell (Mon, 28 Jan 2019 18:09:22 GMT):
may have further thoughts

pschwarz (Mon, 28 Jan 2019 18:23:45 GMT):
I wonder if those should be at least medium

pschwarz (Mon, 28 Jan 2019 18:25:58 GMT):
They default to NONE, because the priority can be set separately. For example, if a group of handlers is configured for a particular message type, the priority is set once for that message type

pschwarz (Mon, 28 Jan 2019 18:26:33 GMT):
Most of the MEDIUM level messages are for peering initialization

pschwarz (Mon, 28 Jan 2019 18:26:46 GMT):
Everything else is default priority

pschwarz (Mon, 28 Jan 2019 18:27:20 GMT):
Ping Responses are HIGH, so connections aren't dropped, but they're the only ones

ltseeley (Mon, 28 Jan 2019 18:32:08 GMT):
I was not aware that there was a priority mechanism for handlers. Seems useful though

jsmitchell (Mon, 28 Jan 2019 19:02:09 GMT):
do you guys have some suggestions for @manojgop about what specific message types he should raise the priority of? All the consensus messages? Seems like a good opportunity for some testing since he's able to recreate this issue.

ltseeley (Mon, 28 Jan 2019 19:19:45 GMT):
Hmm, bumping up the priority of the peer (engine-to-engine) consensus messages might be interesting; if that works well, it might prevent new elections occuring in Raft. Might also be interesting to try increasing priority of all consensus messages (or even just block commits, since those seem to be slowed down).

muniyaraj (Wed, 30 Jan 2019 08:42:07 GMT):
hi all, Can you please tell how to use sawtooth explorer Have any guidance for that?

LeonardoCarvalho (Wed, 30 Jan 2019 10:54:05 GMT):
1858

LeonardoCarvalho (Wed, 30 Jan 2019 10:54:05 GMT):
Guys, any news on backporting https://github.com/hyperledger/sawtooth-core/pull/1858 ?

pschwarz (Wed, 30 Jan 2019 16:15:28 GMT):
Have you created a backport PR for it?

pschwarz (Wed, 30 Jan 2019 16:16:27 GMT):
Create a branch against 1-1, and cherry pick the commit. When you submit the PR, title it as "BACKPORT 1-1: " and reference the original PR in the description

pschwarz (Wed, 30 Jan 2019 16:16:39 GMT):
It will get it in the review process

LeonardoCarvalho (Wed, 30 Jan 2019 22:29:41 GMT):
ok, thank you!

amundson (Thu, 31 Jan 2019 16:32:21 GMT):
@LeonardoCarvalho if the backport is merged prior to next Tuesday, you could request a release on Tuesday in #sawtooth-release for the following Thursday. I think we have one other fix too that is likely to justify a point release.

rjones (Thu, 31 Jan 2019 19:00:26 GMT):
Has left the channel.

LeonardoCarvalho (Fri, 01 Feb 2019 10:58:28 GMT):
Ok, I'll start learning the procedure.

amundson (Fri, 01 Feb 2019 14:58:00 GMT):
@rbuysse and I are considering merging the python signing library into the python sdk

danintel (Fri, 01 Feb 2019 16:52:48 GMT):
All the Jenkins builds of sawtooth-core seem to be failing. I see this error in my log and other logs: ``` validator/sawtooth_validator/networking/interconnect.py:31:0: E0611: No name 'asyncio' in module 'zmq.auth' (no-name-in-module)``` https://build.sawtooth.me/job/Sawtooth-Hyperledger/job/sawtooth-core/job/PR-2012/4/console File interconnect.py changed 2 months ago, so something else is missing or changed.

amundson (Fri, 01 Feb 2019 17:32:00 GMT):
@danintel this is something @pschwarz and others have been looking into

danintel (Fri, 01 Feb 2019 17:44:44 GMT):
Thanks! CI is great for preventing regression. It seems fragile though. I don't know if it's a Jenkins thing. Travis also seems fragile. Needs a lot of care and feeding.

amundson (Fri, 01 Feb 2019 17:53:38 GMT):
in no way is it a problem with CI

TomBarnes (Fri, 01 Feb 2019 17:55:11 GMT):
The Intel team also has a couple of PRs that they are looking to move forward into a point release.

amundson (Fri, 01 Feb 2019 17:56:45 GMT):
do you know which?

TomBarnes (Fri, 01 Feb 2019 18:15:54 GMT):
@amundson #1995 in sawtooth-core, #6 in cxx sdk, #4 in rust sdk

amundson (Fri, 01 Feb 2019 18:23:04 GMT):
we generally only backport bug fixes and doc updates, so 1995 is probably destined for 1.2 and not a 1.1.x point release

TomBarnes (Fri, 01 Feb 2019 18:27:35 GMT):
I dont think this breaks backward compatability, and I see other non-breaking api changes in past sawtooth point releases

amundson (Fri, 01 Feb 2019 18:33:23 GMT):
that were new features?

amundson (Fri, 01 Feb 2019 18:33:45 GMT):
AFAIK, these proto APIs haven't even changed since v1.0

TomBarnes (Fri, 01 Feb 2019 18:33:50 GMT):
Extension of existing APIs.

amundson (Fri, 01 Feb 2019 18:34:04 GMT):
which backports?

TomBarnes (Fri, 01 Feb 2019 18:34:25 GMT):
for exmaple in 1.0.4 "Added `client_max_size` to Sawtooth Rest API Configuration for controlling the size of batches submitted"

amundson (Fri, 01 Feb 2019 18:43:22 GMT):
My interpretation of that was that it fixed a bug around the unintentional 1MB limit and then added a config option so you could set it to get the previous behavior if you wished. But, I don't equate REST API configuration changes to TP protocol changes in any case.

TomBarnes (Fri, 01 Feb 2019 18:45:00 GMT):
its a non-breaking change thats tranparent to the user

TomBarnes (Fri, 01 Feb 2019 18:45:16 GMT):
i think thats the key point

amundson (Fri, 01 Feb 2019 18:50:37 GMT):
Well, within 1.x, we aren't breaking the API -- we won't even consider that prior to 2.0. So your argument holds and that's why we would even allow it into master (and thus 1.2) in the first place. But when we add features, we increment the minor number (1.1->1.2), so that you can write code against 1.1 and it will work with any 1.1.x validator.

amundson (Fri, 01 Feb 2019 18:51:40 GMT):
how urgent is this capability for you?

amundson (Fri, 01 Feb 2019 18:52:56 GMT):
(to be in a release)

TomBarnes (Fri, 01 Feb 2019 18:54:48 GMT):
This one and RFC #34 have significant business urgency. Unless we anticipated a 1.2 release in Q1, waiting for a 1.2 release would create difficulties.

amundson (Fri, 01 Feb 2019 19:04:46 GMT):
RFC 34 doesn't seem to have much support yet. It is problematic for reasons describe in the RFC commentary.

amundson (Fri, 01 Feb 2019 19:07:17 GMT):
re:1.2 in Q1 - haven't really considered it yet. In theory it shouldn't be as heavy-weight process-wise as 1.1 was (which took a lot of effort), but master has some bugs in it that would need to be squashed and a LR7 passed.

amundson (Fri, 01 Feb 2019 19:09:33 GMT):
poet doesn't pass w/master currently, maybe because of a block manager ref counting bug (that's the current theory), though there could be poet-specific bugs too.

pschwarz (Fri, 01 Feb 2019 19:29:31 GMT):
RFC #34 can be worked around, based on feedback in that RFC, if it has business urgency

LeonardoCarvalho (Mon, 04 Feb 2019 11:33:12 GMT):
Just created my PR : https://github.com/hyperledger/sawtooth-core/pull/2014

amundson (Mon, 04 Feb 2019 22:36:49 GMT):
@jsmitchell @pschwarz @agunde @boydjohnson we plan to extend the TP protocol slightly with this PR - https://github.com/hyperledger/sawtooth-core/pull/1995 - in the comments, I raise a concern we would like to get more input on, which is namely how we get TPs using the new protocol to be rejected by previous validators that only talk the old protocol. For example, here we could have a TP requesting RAW and older validators would ignore it an behave incorrectly (well, behave without knowledge of the new protocol elements). We planned for this in our PeerRegisterRequest but apparently not with TPs. Thoughts on implementing a protocol_version for TpRegisterRequest (which would only help if we backported it, this time) or some other possible approaches?

amundson (Mon, 04 Feb 2019 22:41:25 GMT):
@arsulegai ^

arsulegai (Tue, 05 Feb 2019 03:02:07 GMT):
* It would probably be TpRegisterResponse to address this concern ^

arsulegai (Wed, 06 Feb 2019 00:45:33 GMT):
Guys, any feedback for this PR?

agunde (Wed, 06 Feb 2019 15:18:53 GMT):
@amundson to handle current TPs that do not provide this protocal_version, would we default them to the old protocal? And then TPs that need Raw would only work with a validator using tp protocol version 2, otherwise the connection will be closed? That sounds fine to me.

agunde (Wed, 06 Feb 2019 15:18:53 GMT):
@amundson to handle current TPs that do not provide this protocol_version, would we default them to the old protocol? And then TPs that need Raw would only work with a validator using tp protocol version 2, otherwise the connection will be closed? That sounds fine to me.

agunde (Wed, 06 Feb 2019 15:20:08 GMT):
This will effect the way Sabre handles smart contracts as well, a thing to keep in mind.

nanspro (Wed, 06 Feb 2019 15:26:30 GMT):
Has left the channel.

amundson (Thu, 07 Feb 2019 02:54:17 GMT):
I think that we probably need protocol_version on the request and response. if the validator does not send back the same protocol version that was requested, the sdk could panic, shutting down the tp.

arsulegai (Thu, 07 Feb 2019 03:12:31 GMT):
My views 1. How about graceful "unregister" procedure initiated from SDK instead of panic? 2. Having protocol_version in request message would be redundant, would it be fine if hard coded values are allowed at both SDK & Validator? Coming to the question asked by @agunde , I see version specified as ^0.1 and others may also have followed similar approach, would that mean SDK version needs to change to 0.2? Ref: https://github.com/hyperledger/sawtooth-sabre/blob/74788358f4fc3b5b93d6aa9ced9e8ddc81131b72/tp/Cargo.toml#L28

arsulegai (Thu, 07 Feb 2019 03:12:31 GMT):
My views 1. How about graceful "unregister" procedure initiated from SDK instead of panic? 2. Having protocol_version in request message would be redundant, would it be fine if hard coded values are allowed at both SDK & Validator?

amundson (Thu, 07 Feb 2019 04:16:00 GMT):
re:1 - it could be graceful if the field is empty, because that is "protocol 0" (the current protocol) and we know how to unregister. this is the only actual case there should ever be a mismatch between request and response.

amundson (Thu, 07 Feb 2019 04:17:22 GMT):
re:2 - we need it in the request message, because the validator should determine if it can handle that protocol version. if it can not, it should reject the registration - at a minimum, refusing to send any processing requests to the TP.

amundson (Thu, 07 Feb 2019 04:21:13 GMT):
within sawtooth 1.x, we should keep compatibility back to protocol version 0, and for 2.0 we can figure out what we want to do

arsulegai (Thu, 07 Feb 2019 04:24:06 GMT):
Ok, thanks

amundson (Thu, 07 Feb 2019 04:34:43 GMT):
@arsulegai for the sdk-side of things, it should unpack and fill in all the fields in the protobuf (I haven't looked back at the RFC, but I wonder if that's what we intended there) before sending it on to apply(). I have some code I'll add soon to the rust SDK (master) which will remove protobuf objects from the API and this will get more elegant after that change. @boydjohnson also has some slight API-breaking changes for the Rust SDK so we will try and do the changes close together.

arsulegai (Thu, 07 Feb 2019 04:58:46 GMT):
Ok, I was planning to make use of protobuf "oneof" to fill either header/header_bytes to save extra bytes. Introducing oneof would make it irreversible. SDK is making use of extracted header fields to identify handler to make a apply() call so there'll be need to de-serialize header at SDK in case raw header format is requested. I'll monitor master branch of rust SDK and follow your leads.

amundson (Thu, 07 Feb 2019 05:08:39 GMT):
we've avoided oneof in the past

amundson (Thu, 07 Feb 2019 05:09:26 GMT):
do you have a diff that shows how you were planning to sue it?

amundson (Thu, 07 Feb 2019 05:09:26 GMT):
do you have a diff that shows how you were planning to use it?

arsulegai (Thu, 07 Feb 2019 05:11:16 GMT):
Yes, in my work in progress dev branch https://github.com/arsulegai/sawtooth-core/commit/3bb2a0faccc7b72380322c3fe8bb8b77196442de#diff-05cadfb3661c6c964d6a6b910205d654R99

amundson (Thu, 07 Feb 2019 05:27:40 GMT):
that makes me nervous without a lot of backwards-compatibility tests

amundson (Thu, 07 Feb 2019 05:28:58 GMT):
I was under the impression that empty fields were not in the serialized form, and so this would not actually save any space. we could test that.

arsulegai (Thu, 07 Feb 2019 05:39:03 GMT):
I am afraid of this too, you are correct there's no saving space with just empty fields.

amundson (Thu, 07 Feb 2019 05:41:45 GMT):
we won't have our changes into the SDK immediately, so if you want to just remove the oneof and fill in the empty fields on the SDK-side when RAW is used, that should work fine. the changes from us are possible sometime next week.

amundson (Thu, 07 Feb 2019 05:44:09 GMT):
the approach is going to be to convert from the protobuf to another struct. I do have some code to do that, but it is currently targetting sawtooth 2.0 and does a bunch of string->bytes conversion which might need to be redone to put it into the sawtooth sdk.

arsulegai (Thu, 07 Feb 2019 05:45:10 GMT):
Ok, sounds good. I will go back to sending header_bytes with empty header if RAW is requested.

amundson (Thu, 07 Feb 2019 15:39:04 GMT):
we plan to start doing contributor meetings soon, @mfford will schedule them starting sometime after we have started the similar grid meetings.

amolk (Thu, 07 Feb 2019 16:34:33 GMT):
That would be super helpful.

manojgop (Fri, 08 Feb 2019 15:57:32 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=PCM2Q9eQnqWgEoSrn) @jsmitchell @ltseeley @pschwarz @amundson I tried running Raft by enabling the metrics. ``` *Issue 1:* ``` What I observe is whenever total memory occupied by the validator process gets close to max system memory a disk I/O will happen . This looks to be LMDB writing to the disk. Disk /O is in order of TBps. At this point the TP processing gets slow and message processing time also increases (can be seen in the grafana graph). Since the int-tp client workload is also running this will queue in batches and we eventually get queue full error. So the LMDB disk I/O is slowing the system down. The TP processing is also blocked at this time since TP calls get/set state on LMDB merkle tree.``` ```

manojgop (Fri, 08 Feb 2019 15:57:32 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=PCM2Q9eQnqWgEoSrn) @jsmitchell @ltseeley @pschwarz @amundson I tried running Raft by enabling the metrics. ``` *Issue 1:* ``` What I observe is whenever total memory occupied by the validator process gets close to max system memory a disk I/O will happen . This looks to be LMDB writing to the disk. Disk /O is in order of TBps. At this point the TP processing gets slow and message processing time also increases (can be seen in the grafana graph). Since the int-tp client workload is also running this will queue in batches and we eventually get queue full error. So the LMDB disk I/O is slowing the system down. The TP processing is also blocked at this time since TP calls get/set state on LMDB merkle tree.``` ```

manojgop (Fri, 08 Feb 2019 15:57:32 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=PCM2Q9eQnqWgEoSrn) @jsmitchell @ltseeley @pschwarz @amundson I tried running Raft by enabling the metrics. *Issue 1:* - What I observe is whenever total memory occupied by the validator process gets close to max system memory a disk I/O will happen . This looks to be LMDB writing to the disk. Disk /O is in order of TBps. At this point the TP processing gets slow and message processing time also increases (can be seen in the grafana graph). Since the int-tp client workload is also running this will queue in batches and we eventually get queue full error. So the LMDB disk I/O is slowing the system down. The TP processing is also blocked at this time since TP calls get/set state on LMDB merkle tree. *Issue 2:* - When Raft leader nodes rejects batches forwarded follower nodes since leader queue if full, these batches will remain in follower queue forever (till leader changes) since leader only add publishes the blocks and add batches in leader queue. May be there should be a mechanism for clearing the batches in the validator queue if its doesn't get processed after certain configurable number of blocks.

manojgop (Fri, 08 Feb 2019 15:57:32 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=PCM2Q9eQnqWgEoSrn) @jsmitchell @ltseeley @pschwarz @amundson I tried running Raft by enabling the metrics. *Issue 1:* - What I observe is whenever total memory occupied by the validator process gets close to max system memory a disk I/O will happen . This looks to be LMDB writing to the disk. Disk /O is in order of TBps. At this point the TP processing gets slow and message processing time also increases (can be seen in the grafana graph). Since the int-tp client workload is also running this will queue in batches and we eventually get queue full error. So the LMDB disk I/O is slowing the system down. The TP processing is also blocked at this time since TP calls get/set state on LMDB merkle tree. *Issue 2:* - When Raft leader nodes rejects batches forwarded follower nodes since leader queue if full, these batches will remain in follower queue forever (till leader changes) since leader only add publishes the blocks and add batches in leader queue. May be there should be a mechanism for clearing the batches in the validator queue if its doesn't get processed after certain configurable number of blocks.

manojgop (Fri, 08 Feb 2019 15:57:32 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=PCM2Q9eQnqWgEoSrn) @jsmitchell @ltseeley @pschwarz @amundson I tried running Raft by enabling the metrics. *Issue 1:* - What I observe is whenever total memory occupied by the validator process gets close to max system memory a disk I/O will happen . This looks to be LMDB writing to the disk. Disk /O is in order of TBps. At this point the TP processing gets slow and message processing time also increases (can be seen in the grafana graph). Since the int-tp client workload is also running this will queue in batches and we eventually get queue full error. So the LMDB disk I/O is slowing the system down. The TP processing is also blocked at this time since TP calls get/set state on LMDB merkle tree. *Issue 2:* - When Raft leader nodes rejects batches forwarded follower nodes since leader queue if full, these batches will remain in follower queue forever (till leader changes) since leader only add publishes the blocks and add batches in leader queue. May be there should be a mechanism for clearing the batches in the validator queue if its doesn't get processed after certain configurable number of blocks.

manojgop (Fri, 08 Feb 2019 15:57:32 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=PCM2Q9eQnqWgEoSrn) @jsmitchell @ltseeley @pschwarz @amundson I tried running Raft by enabling the metrics. *Issue 1:* - What I observe is whenever total memory occupied by the validator process gets close to max system memory a disk I/O will happen . This looks to be LMDB writing to the disk. Disk /O is in order of TBps. At this point the TP processing gets slow and message processing time also increases (can be seen in the grafana graph). Since the int-tp client workload is also running this will queue in batches and we eventually get queue full error. So the LMDB disk I/O is slowing the system down. The TP processing is also blocked at this time since TP calls get/set state on LMDB merkle tree. *Issue 2:* - When Raft leader nodes rejects batches forwarded follower nodes since leader queue if full, these batches will remain in follower queue forever (till leader changes) since leader only publishes the blocks with contain batches in leader queue. May be there should be a mechanism for clearing the batches in the validator queue if its doesn't get processed after certain configurable number of blocks.

manojgop (Fri, 08 Feb 2019 15:57:32 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=PCM2Q9eQnqWgEoSrn) @jsmitchell @ltseeley @pschwarz @amundson I tried running Raft by enabling the metrics. *Issue 1:* - What I observe is whenever total memory occupied by the validator process gets close to max system memory a disk I/O will happen . This looks to be LMDB writing to the disk. Disk /O is in order of TBps. At this point the TP processing gets slow and message processing time also increases (can be seen in the grafana graph). Since the int-tp client workload is also running this will queue in batches and we eventually get queue full error. So the LMDB disk I/O is slowing the system down. The TP processing is also blocked at this time since TP calls get/set state on LMDB merkle tree. *Issue 2:* - When Raft leader nodes rejects batches forwarded follower nodes since leader queue if full, these batches will remain in follower queue forever (till leader changes) since leader only publishes the blocks with contain batches in leader queue. May be there should be a mechanism for clearing the batches in the validator queue if it doesn't get processed after a configurable number of blocks.

jsmitchell (Fri, 08 Feb 2019 16:04:24 GMT):
What do you mean “TBps”?

manojgop (Fri, 08 Feb 2019 16:04:43 GMT):

grafana_metrics_lmdb.JPG

manojgop (Fri, 08 Feb 2019 16:04:56 GMT):
TeraByes/sec

manojgop (Fri, 08 Feb 2019 16:04:56 GMT):
Tera Bytes/sec

manojgop (Fri, 08 Feb 2019 16:05:31 GMT):
LMDB size is 1 TB

jsmitchell (Fri, 08 Feb 2019 16:05:41 GMT):
Yeah, as a sparse file...

manojgop (Fri, 08 Feb 2019 16:05:47 GMT):
yes

jsmitchell (Fri, 08 Feb 2019 16:05:51 GMT):
What filesystem are you running this on?

jsmitchell (Fri, 08 Feb 2019 16:06:18 GMT):
Is Windows involved at all?

manojgop (Fri, 08 Feb 2019 16:06:33 GMT):
I'm using a Ubuntu VM. I'll rerun this on Intel NUC. No windows

manojgop (Fri, 08 Feb 2019 16:06:33 GMT):
I'm using a Ubuntu VM. I'll rerun this on Intel NUC with SSD. No windows

jsmitchell (Fri, 08 Feb 2019 16:06:46 GMT):
Ubuntu VM on what?

manojgop (Fri, 08 Feb 2019 16:08:45 GMT):
Linux server Virtualization

manojgop (Fri, 08 Feb 2019 16:09:39 GMT):
BTW to reproduce the issue quickly I limited the docker container memory to 70M. I run 5TPS int key workload

manojgop (Fri, 08 Feb 2019 16:11:43 GMT):
with think after 5-6 mins the container memory gets full and disk I/O happens. If I don't set any container memory limit, then disk I/O happens only after validator process consume most of system memory.

manojgop (Fri, 08 Feb 2019 16:11:43 GMT):
After 5-6 mins the container memory gets full and disk I/O happens. If I don't set any container memory limit, then disk I/O happens only after validator process consume most of system memory.

manojgop (Fri, 08 Feb 2019 16:11:43 GMT):
After 5-6 mins the container memory gets full and disk I/O happens.

manojgop (Fri, 08 Feb 2019 16:12:18 GMT):
You can notice from the graph Read/Write IOPS peaks

manojgop (Fri, 08 Feb 2019 16:12:59 GMT):
This issue can be reproduced consistently and quickly if we limit the validator docker container memory

manojgop (Fri, 08 Feb 2019 16:14:30 GMT):
But if I use a system with 8 GB RAM then we will notice this issue only after running for more than 8 days with 5TPS since its take long for system memory to get consumed

jsmitchell (Fri, 08 Feb 2019 16:17:52 GMT):
We regularly run long running tests. The behavior we see is regular write IO and almost no read IO until Linux fscache pages fill up available memory. At that point there will be a small, steady amount of read IO to fetch pages off of disk that are not in the cache. We don’t see the spikes you are seeing.

manojgop (Fri, 08 Feb 2019 16:18:57 GMT):
Which is the disk type you use ? SSD, HDD ?

manojgop (Fri, 08 Feb 2019 16:18:57 GMT):
@jsmitchell @ltseeley One more issue with Raft consensus and validator is , when Raft leader nodes rejects batches forwarded follower nodes since leader queue if full, these batches will remain in follower queue forever (till leader changes) since leader only publishes the blocks with contain batches in leader queue. May be there should be a mechanism for clearing the batches in the validator queue if it doesn't get processed after a configurable number of blocks. This can also increase memory consumption in validator nodes (follower nodes)

jsmitchell (Fri, 08 Feb 2019 16:32:54 GMT):
SSD. The write workload is intensive — 1k-4k iops depending on how hard you are pushing state

jsmitchell (Fri, 08 Feb 2019 16:33:07 GMT):
HDD won’t cut it

jsmitchell (Fri, 08 Feb 2019 16:33:35 GMT):
But that still should not result in those enormous IO spikes

circlespainter (Sat, 09 Feb 2019 10:27:22 GMT):
Has joined the channel.

arsulegai (Mon, 11 Feb 2019 16:52:37 GMT):
I experimented above experiments on a machine with SSD support

arsulegai (Mon, 11 Feb 2019 16:52:37 GMT):
I experimented above experiments on a machine with SSD

arsulegai (Mon, 11 Feb 2019 16:52:45 GMT):
Result is same

arsulegai (Mon, 11 Feb 2019 16:55:25 GMT):
I experimented above scenario on a machine with SSD, there were no IO spikes but the outcome is same

arsulegai (Mon, 11 Feb 2019 16:55:25 GMT):
I experimented above scenario on a machine with SSD, there were no sudden IO spikes but the outcome is same

arsulegai (Mon, 11 Feb 2019 17:15:17 GMT):
Coming to observation from logs: 1. All Raft engines have same log entry, leader engine has published a new block and is waiting for follower nodes to confirm Raft log entry to proceed. 2. Input tps remained constant, so I see queue full and batches rejected. Machine's memory was used to the full capacity. 3. Follower nodes too printed that they received the block and they validated from their end. 4. Dispatcher queue is full and memory usage is maximum, then there's no activity in the network other than heartbeat.

arsulegai (Mon, 11 Feb 2019 17:22:26 GMT):

Experiment - limited memory in machine, IO to SSD

arsulegai (Mon, 11 Feb 2019 17:23:11 GMT):
Machines were limited to maximum of 70MB

arsulegai (Mon, 11 Feb 2019 17:42:54 GMT):
My initial suspicion is that propose() from leader is lost due to no space left in follower nodes, leader has advanced and hence is unable to send proposal again It could be wrong as well, there's no definite way to prove it without more logging. Any suggestions?

jsmitchell (Mon, 11 Feb 2019 17:43:23 GMT):
Need to understand the nature of that read IO. What’s causing it?

jsmitchell (Mon, 11 Feb 2019 17:44:54 GMT):
It probably has nothing to do with the lmdb. At 70mb, you are probably running into swap thrashing just with stack+heap on the validator process.

jsmitchell (Mon, 11 Feb 2019 17:45:39 GMT):
I am used to seeing the validator process use 800MB-1.2GB of ram. Why would we expect it to run within 70MB?

arsulegai (Mon, 11 Feb 2019 17:47:13 GMT):
It was only to get to the issue scenario sooner, which otherwise we were seeing after days of running

jsmitchell (Mon, 11 Feb 2019 17:49:50 GMT):
How much memory was available to the process in those machines (where you saw issues after days of running)? What did memory consumption look like when the issue occurred? Did that coincide with the increased read IO?

jsmitchell (Mon, 11 Feb 2019 17:55:57 GMT):
If you are saying that one of the sawtooth processes is consuming most of 8GB of ram for stack+heap (NOT fscache), then that sounds like a memory leak.

arsulegai (Mon, 11 Feb 2019 18:01:33 GMT):
Will continue more experiments on it and report here, we have other tests still running

arsulegai (Mon, 11 Feb 2019 18:09:28 GMT):
A random thought: Would it sound good if even the peer validators can decide priority of consensus engine messages. i.e. Use a new message type (not the Gossip message as it is sending today) for sending consensus messages to peers and increase chances of handling those messages by raising priority?

jsmitchell (Mon, 11 Feb 2019 18:12:08 GMT):
I had a conversation with @ltseeley about that very thing last week

arsulegai (Mon, 11 Feb 2019 18:13:04 GMT):
I think I had similar comment to one of his PR in sawtooth-core in another context

jsmitchell (Mon, 11 Feb 2019 18:13:49 GMT):
But, that is probably separate from a possible memory leak as a root cause discussion. (I.e if the machine starts swapping, the node is toast regardless if it is handling consensus messages with higher priority or not)

arsulegai (Mon, 11 Feb 2019 18:14:02 GMT):
Agree

Mr-zohaibkhalid (Mon, 11 Feb 2019 19:36:22 GMT):
Has joined the channel.

manojgop (Wed, 13 Feb 2019 04:33:47 GMT):
When Raft leader nodes rejects batches forwarded follower nodes since leader queue if full, these batches will remain in follower queue forever (till leader changes) since leader only publishes the blocks with contain batches in leader queue. May be there should be a mechanism for clearing the batches in the validator queue if it doesn't get processed after a configurable number of blocks. This can also lead to memory consumption in validator nodes (follower nodes in case of Raft)

manojgop (Wed, 13 Feb 2019 04:33:47 GMT):
@jsmitchell @ltseeley One more issue with validator memory during Raft consensus is when Raft leader nodes rejects batches forwarded follower nodes since leader queue if full, these batches will remain in follower queue forever (till leader changes) since leader only publishes the blocks with contain batches in leader queue. May be there should be a mechanism for clearing the batches in the validator queue if it doesn't get processed after a configurable number of blocks. This can also lead to memory consumption in validator nodes (follower nodes in case of Raft)

manojgop (Wed, 13 Feb 2019 04:33:47 GMT):
@jsmitchell @ltseeley One more issue with validator memory during Raft consensus is when Raft leader nodes rejects batches forwarded follower nodes since leader queue if full, these batches will remain in follower queue forever (till leader changes) since leader only publishes the blocks with contain batches in leader queue. May be there should be a mechanism for clearing the batches in the validator queue if it doesn't get processed after a long time (for example after configurable number of blocks.) This can also lead to memory consumption in validator nodes (follower nodes in case of Raft)

manojgop (Wed, 13 Feb 2019 04:33:47 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=4yMgtnckaYRH7PsB4) @amundson @amundson in case of consensus algorithms like Raft , the transactions sent by clients to the follower nodes will remain in queue for ever if the leader node rejects the batches sent by follower node via gossip (when leader queue if full temporarily) or unless follower nodes becomes a leader and inserts those transaction in a block. These transactions are actually valid but will remain in the follower node queue till the follower node becomes the leader. So could we use the time-in-queue or block horizon approach to clear the pending valid transactions in queue ?

arsulegai (Wed, 13 Feb 2019 06:59:21 GMT):
Since we are talking about possible improvements, there are few other things to help us in addition to what @manojgop mentioned 1. Possibly the current trace of dispatcher queue full can slightly be altered or a new trace can be introduced which states whether validator has capacity to accept new messages. Current trace will print dispatcher queue is full if there are more than 10 messages but in reality it might not have enough space left. 2. Name dispatcher queue so that it's identifiable from logs which of the interface messages are getting lost due to lack of memory. This is just to bring in more clarity in logs. 3. Possibility of moving dispatcher part of code from python to rust.

jaypersanchez (Wed, 13 Feb 2019 11:29:32 GMT):
Has joined the channel.

jaypersanchez (Wed, 13 Feb 2019 11:30:49 GMT):
Hello everyone. How can I start to contribute to Sawtooth?

amundson (Wed, 13 Feb 2019 18:56:19 GMT):
@jaypersanchez do you have a particular area of interest?

arsulegai (Thu, 14 Feb 2019 08:22:35 GMT):
Fork to new discussion: @jsmitchell @pschwarz @boydjohnson @amundson This is regarding the patch to remove invalid transactions and avoid potential loop over invalid transactions https://github.com/hyperledger/sawtooth-core/pull/1994 . Current solution provided here appears to be fine for a node creating block but not for nodes which do not participate in creating blocks. Can we think of breaking this into smaller chunks and target solving one problem at a time?

jsmitchell (Thu, 14 Feb 2019 15:38:37 GMT):
@arsulegai what do you propose? Current thoughts I’ve heard are some kind of time horizon based mechanism for purging the pending queue.

arsulegai (Thu, 14 Feb 2019 15:43:17 GMT):
The reason I am not inclined towards time bound mechanism is because consensus engine decides when to create the block. Validator may not be right candidate to apply such logic, there's possibility that transactions may get removed even before they are properly scheduled.

arsulegai (Thu, 14 Feb 2019 15:48:44 GMT):
How about number of blocks committed since the time of arrival of transaction? This is something in control of administrator and validator cannot be solely responsible for losing transactions.

arsulegai (Thu, 14 Feb 2019 15:53:12 GMT):
I think I am diverging again, we can classify discussion into 2 categories 1. Periodic purging of pending batches and make sure transactions are not stalled in queue forever. 2. Clearing up invalid transaction even if there's no block created when processor has executed it and marked as invalid.

amundson (Thu, 14 Feb 2019 16:22:46 GMT):
for (1), do you mean invalid transactions?

amundson (Thu, 14 Feb 2019 16:23:26 GMT):
removing valid transactions at any point could have BFT implications we would need to thing through

arsulegai (Thu, 14 Feb 2019 16:33:44 GMT):
I'm ok with periodic execution of pending transactions and removal of only invalid transactions. But executing transactions just to remove invalid ones will increase number of times a transaction may get executed by a validator to 3.

arsulegai (Thu, 14 Feb 2019 16:34:33 GMT):
Hmm, agree that removal of valid transactions with above proposal is not a good idea.

amundson (Thu, 14 Feb 2019 16:38:06 GMT):
hmm, that is an interesting idea

amundson (Thu, 14 Feb 2019 16:39:06 GMT):
well, you could calculate a time-in-queue for each transaction, and then periodically run it to see if it has become invalid without worrying about the execution overhead, because it would be done so seldom.

amundson (Thu, 14 Feb 2019 16:39:46 GMT):
i.e. don't test a transaction for validity if it has been in the queue for less than N minutes

arsulegai (Thu, 14 Feb 2019 16:44:25 GMT):
I'm wary of defining period here with respect to time or even number of blocks since transaction's arrival.

arsulegai (Thu, 14 Feb 2019 16:46:39 GMT):
Adding to the proposal, this idea can be extended to all validators irrespective of commands from their consensus engine.

amolk (Fri, 15 Feb 2019 09:18:13 GMT):
@arsulegai as @jsmitchell mentioned, a block horizon would be a good metric for periodically testing validity of pending transactions. And it could be configured by the administrator.

arsulegai (Fri, 15 Feb 2019 09:23:38 GMT):
Block horizon sounds better than time horizon :)

LeonardoCarvalho (Fri, 15 Feb 2019 14:20:18 GMT):
Yep, Time Horizon reminds me of Black Holes...

danintel (Fri, 15 Feb 2019 16:35:30 GMT):
Then we can call it Event Horizon 🙂

manojgop (Sat, 16 Feb 2019 07:57:46 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=4yMgtnckaYRH7PsB4) @amundson @amundson in case of consensus algorithms like Raft, the transactions sent by client to follower node will remain in the follower queue for ever if the leader rejects these transaction sent via gossip (if leader queue if full temporarily) or unless the follower nodes becomes leader at later point and commits these transactions in a block. These are actually valid transactions but can remain in follower node queue in above scenario. Can we use time-in-queue or block horizon approach to remove these pending transactions ?

manojgop (Sat, 16 Feb 2019 07:57:46 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=4yMgtnckaYRH7PsB4) @amundson @jsmitchell in case of consensus algorithms like Raft, the transactions sent by client to follower node will remain in the follower queue for ever if the leader rejects these transaction sent via gossip (if leader queue if full temporarily) or unless the follower nodes becomes leader at later point and commits these transactions in a block. These are actually valid transactions but can remain in follower node queue in above scenario. Can we use time-in-queue or block horizon approach to remove these pending transactions ?

manojgop (Sat, 16 Feb 2019 07:57:46 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=4yMgtnckaYRH7PsB4) @amundson in case of consensus algorithms like Raft, the transactions sent by client to follower node will remain in the follower queue for ever if the leader rejects these transaction sent via gossip (if leader queue if full temporarily) or unless the follower nodes becomes leader at later point and commits these transactions in a block. These are actually valid transactions but can remain in follower node queue in above scenario. Can we use time-in-queue or block horizon approach to remove these pending transactions ?

manojgop (Sat, 16 Feb 2019 07:57:46 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=4yMgtnckaYRH7PsB4) @amundson in case of consensus algorithms like Raft, the transactions sent by client to follower node will remain in the follower queue for ever if the leader rejects these transaction sent via gossip (if leader queue if full temporarily) or unless the follower nodes becomes leader at later point and insets these transactions in a block and commits it. These are actually valid transactions but can remain in follower node queue in above scenario. Can we use time-in-queue or block horizon approach to remove these pending transactions ?

manojgop (Sat, 16 Feb 2019 07:57:46 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=4yMgtnckaYRH7PsB4) @amundson in case of consensus algorithms like Raft, the transactions sent by client to follower node will remain in the follower queue for ever if the leader rejects these transaction sent via gossip (if leader queue is full temporarily) or unless the follower nodes becomes leader at later point and insets these transactions in a block and commits it. These are actually valid transactions but can remain in follower node queue in above scenario. Can we use time-in-queue or block horizon approach to remove these pending transactions ?

manojgop (Sat, 16 Feb 2019 07:57:46 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=4yMgtnckaYRH7PsB4) @amundson in case of consensus algorithms like Raft, the transactions sent by client to follower node will remain in the follower queue for ever if the leader rejects these transaction sent via gossip (if leader queue is full temporarily) or unless the follower nodes becomes leader at later point and adds these transactions in a block and commits it. These are actually valid transactions but can remain in follower node queue in above scenario. Can we use time-in-queue or block horizon approach to remove these pending transactions ?

manojgop (Sat, 16 Feb 2019 07:57:46 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=4yMgtnckaYRH7PsB4) @amundson in case of consensus algorithms like Raft, the transactions sent by client to follower node will remain in the follower queue for ever if the leader rejects these transaction sent via gossip (if leader queue is full temporarily) or unless the follower nodes becomes leader at later point and adds these transactions in a block and commits it. These are actually valid transactions but can remain in follower node queue for long time in above scenario. Can we use time-in-queue or block horizon approach to remove these pending transactions ?

manojgop (Sat, 16 Feb 2019 07:57:46 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=4yMgtnckaYRH7PsB4) @amundson in case of consensus algorithms like Raft, the transactions sent by client to follower node will remain in the follower queue for ever if the leader rejects these transaction sent via gossip (if leader queue is full temporarily) or unless the follower nodes becomes leader at later point and adds these transactions in a block and commits it. These are actually valid transactions but can remain in follower node queue for long time in above scenario. Can we use time-in-queue or block horizon approach to remove these pending valid transactions ?

amolk (Sat, 16 Feb 2019 09:38:27 GMT):
I would recommend invalid transactions be purged and valid transactions broadcast once again.

amolk (Sat, 16 Feb 2019 09:55:02 GMT):
There is already a mechanism to handle duplicate valid transactions by a validator so a re-broadcast by a follower of a transaction already in a leaders queue won't be harmful.

amundson (Sun, 17 Feb 2019 02:15:21 GMT):
valid transactions could be re-broadcast, that would be safe

amundson (Sun, 17 Feb 2019 02:18:02 GMT):
the correct behavior for PBFT (and probably Raft) is to cause a new leader election if the leader is (apparently) refusing to add valid transactions to blocks. the validator should never, ever, drop a valid transaction.

amundson (Sun, 17 Feb 2019 02:21:25 GMT):
I think you might want some randomness in determining whether to do the re-broadcast to prevent all non-leader nodes from broadcasting. With a little more work, it could be turned into send-to-leader instead of broadcast. (this would certainly require Consensus API modifications though)

kodonnel (Wed, 20 Feb 2019 14:33:42 GMT):
Not sure if there is a better channel for this, since it is a build issue. We run regular builds of the core (currently 1.1.3 tag) and sawtooth-poet(currently 1.1.2 tag) with no divergences. Up until today the unit tests were passing, but today the test_config and test_genesis in sawtooth-core, and test_genesis in sawtooth-power have failed with a module import error vs the protobufs. My first guess without digging in is that this would be because of an sdk pulled in that has changed. Two questions a) is this the right channel for that sort of note, if not where? b) any pointers to where exactly I might find that dependency before I dive in to see what can be done to avoid this sort of thing in the future - I'd expect a tagged build to say as stable as it was when tagged and not change day by day.

kodonnel (Wed, 20 Feb 2019 14:35:55 GMT):
Since the builds between core and poet are so similar I'd presume its about the same place in each.

jsmitchell (Wed, 20 Feb 2019 14:39:37 GMT):
@rbuysse ^

amundson (Wed, 20 Feb 2019 14:43:28 GMT):
@kodonnel yes, this is a good channel for that discussion

kodonnel (Wed, 20 Feb 2019 14:46:05 GMT):
cool. I'm mid long meeting. Hopefully will get a chance to dig into it today.

kodonnel (Wed, 20 Feb 2019 15:03:07 GMT):
For the record and brevities sake, a snippet of the error I am looking at `unit-poet-cli_1 | from sawtooth_poet_common.validator_registry_view.validator_registry_view \ unit-poet-cli_1 | File "/project/sawtooth-poet/common/sawtooth_poet_common/validator_registry_view/validator_registry_view.py", line 17, in unit-poet-cli_1 | from sawtooth_poet_common.protobuf.validator_registry_pb2 import ValidatorInfo unit-poet-cli_1 | ImportError: No module named 'sawtooth_poet_common.protobuf'`

kodonnel (Wed, 20 Feb 2019 15:03:07 GMT):
For the record and brevities sake, a snippet of the error I am looking at `unit-poet-cli_1 | from sawtooth_poet_common.validator_registry_view.validator_registry_view \ unit-poet-cli_1 | File "/project/sawtooth-poet/common/sawtooth_poet_common/validator_registry_view/validator_registry_view.py", line 17, in unit-poet-cli_1 | from sawtooth_poet_common.protobuf.validator_registry_pb2 import ValidatorInfo unit-poet-cli_1 | ImportError: No module named 'sawtooth_poet_common.protobuf'`

kodonnel (Wed, 20 Feb 2019 15:03:07 GMT):
`unit-poet-cli_1 | from sawtooth_poet_common.validator_registry_view.validator_registry_view \ unit-poet-cli_1 | File "/project/sawtooth-poet/common/sawtooth_poet_common/validator_registry_view/validator_registry_view.py", line 17, in unit-poet-cli_1 | from sawtooth_poet_common.protobuf.validator_registry_pb2 import ValidatorInfo unit-poet-cli_1 | ImportError: No module named 'sawtooth_poet_common.protobuf'` For the record and brevities sake a snippet of the error I am looking at.

kodonnel (Wed, 20 Feb 2019 15:03:07 GMT):
`unit-poet-cli_1 | from sawtooth_poet_common.validator_registry_view.validator_registry_view \ unit-poet-cli_1 | File "/project/sawtooth-poet/common/sawtooth_poet_common/validator_registry_view/validator_registry_view.py", line 17, in unit-poet-cli_1 | from sawtooth_poet_common.protobuf.validator_registry_pb2 import ValidatorInfo unit-poet-cli_1 | ImportError: No module named 'sawtooth_poet_common.protobuf'` For the record and brevity's sake a snippet of the error I am looking at.

kodonnel (Wed, 20 Feb 2019 15:04:05 GMT):
and a demonstration of failed formatting :)

manojgop (Wed, 20 Feb 2019 17:02:04 GMT):
@amundson @jsmitchell Do we have away to disable gossip broadcast in sawtooth-core for a fully connected and statically peered network like RAFT. This is to avoid rebroadcast of batches and blocks in case of fully connected network. Would it make sense to enable it via some sawtooth settings (meant for admin tasks)

jsmitchell (Wed, 20 Feb 2019 17:04:57 GMT):
@ltseeley ^

ltseeley (Wed, 20 Feb 2019 17:14:24 GMT):
@manojgop this is something we've had some conversations about and would like to implement somehow. I see two possibilities for accomplishing this: 1) using settings (likely one or more settings in the `sawtooth.consensus.algorithm` namespace), or 2) consensus engine providing some information when it registers with the validator (similar to how it reports its name and version).

ltseeley (Wed, 20 Feb 2019 17:14:42 GMT):
@pschwarz ^

ltseeley (Wed, 20 Feb 2019 17:20:53 GMT):
My initial thoughts are that settings seem like a nice option because it wouldn't require changes to APIs/SDKs. However, it might be useful for these decisions to be made by the consensus engine and not an administrator. But I suppose the peering configuration (static vs. dynamic, the list of peers) is required to be setup by the admin already, so maybe that wouldn't be as big of a deal.

amundson (Wed, 20 Feb 2019 17:28:22 GMT):
I don't think it's a setting or even an option for gossip. you just want a new call that is a straight broadcast. don't change the behavior of gossip, just don't use it.

amundson (Wed, 20 Feb 2019 17:28:22 GMT):
I don't think it's a setting or even an option for gossip. you just want a new call that is a straight broadcast. don't change the behavior of gossip, just don't use it.

amundson (Wed, 20 Feb 2019 17:31:44 GMT):
ic - the piece not controlled by consensus at all currently? the hint from the consensus engine seems better.

kodonnel (Wed, 20 Feb 2019 17:46:50 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=WdRSCQRmyLmJ8Jf43) So I believe I have zeroed in on the issue here, and it isn't a stray dynamic dependency as I suspected. More just a confusing aspect of the build process which happens to be common in core and poet. Just waiting for a good build to confirm.

pschwarz (Wed, 20 Feb 2019 17:49:51 GMT):
It definitely seems like a hint is better, since the behaviour is very dependent on the consensus engine that is active

amundson (Wed, 20 Feb 2019 17:51:03 GMT):
what if the nodes are supposed to usually be fully connected, but there is a network problem preventing that from being the case? would removing gossip break CFT or BFT expectations?

arsulegai (Wed, 20 Feb 2019 17:56:01 GMT):
Sorry to interrupt: isn't the broadcast issue in the ambit of validator than consensus engine?

jsmitchell (Wed, 20 Feb 2019 17:56:21 GMT):
the consensus is what provides those guarantees. If the consensus requires a certain topology, then asserting that topology is a complex problem.

arsulegai (Wed, 20 Feb 2019 17:57:25 GMT):
Network administrator makes the decision of having fully connected or not when setting up validators.

jsmitchell (Wed, 20 Feb 2019 17:57:39 GMT):
they are related @arsulegai - the decision not to re-broadcast is only possible due to the fully-connected nature of the network

arsulegai (Wed, 20 Feb 2019 17:58:03 GMT):
I agree that dynamic switching of consensus would require consensus knowledge.

jsmitchell (Wed, 20 Feb 2019 17:58:34 GMT):
furthermore, no node knows that the topology is fully connected

arsulegai (Wed, 20 Feb 2019 17:59:35 GMT):
Agree. Probably I was not clear with my question.

arsulegai (Wed, 20 Feb 2019 18:00:15 GMT):
I'm trying to map this with re-broadcast of batch and block which are received over gossip.

jsmitchell (Wed, 20 Feb 2019 18:00:30 GMT):
If the node knew the topology, it could make smarter decisions about when to rebroadcast

jsmitchell (Wed, 20 Feb 2019 18:00:48 GMT):
on a gradient from 0% of the time (fully connected) to some stochastic max

jsmitchell (Wed, 20 Feb 2019 18:02:10 GMT):
i.e. if a node is an 'bridge' node to a poorly connected neighborhood of nodes, it should broadcast to that peer 100% of the time, probably

jsmitchell (Wed, 20 Feb 2019 18:02:36 GMT):
whereas if a node knows that the network is fully connected, it never needs to rebroadcast

arsulegai (Wed, 20 Feb 2019 18:03:54 GMT):
Ok, so you're saying that since consensus engine now mandates certain network topology (such as in raft it's must to have fully connected network). A hint from consensus engine will let validator decide not to re-broadcast.

jsmitchell (Wed, 20 Feb 2019 18:04:08 GMT):
that way, the behavior is a consequence of the shape of the network, rather than a directive by the consensus engine

jsmitchell (Wed, 20 Feb 2019 18:04:43 GMT):
no, i am saying decouple the assertion on the shape of the network from the gossip behavior based on the shape of the network

jsmitchell (Wed, 20 Feb 2019 18:05:27 GMT):
RAFT can assert that the network needs to be fully connected. That occurs via some magic mechanism (probably human action). The networking piece recognizes that the network is fully connected and chooses not to rebroadcast.

arsulegai (Wed, 20 Feb 2019 18:06:33 GMT):
Ah! That's where I was heading to. My initial question was to ask why are we trying to tie consensus engine's hint. Because I saw discussions on that.

arsulegai (Wed, 20 Feb 2019 18:06:47 GMT):
I guess I'm in sync with your thoughts.

jsmitchell (Wed, 20 Feb 2019 18:07:19 GMT):
easier said than done :)

jsmitchell (Wed, 20 Feb 2019 18:08:26 GMT):
i'm not sure it's even practical

jsmitchell (Wed, 20 Feb 2019 18:09:10 GMT):
especially when you factor in the need to deal with byzantine responses from peers

jsmitchell (Wed, 20 Feb 2019 18:10:48 GMT):
nodes would probably need to sign a message listing who they are peered with and distribute it around the network

jsmitchell (Wed, 20 Feb 2019 18:10:55 GMT):
on a fairly frequent basis

jsmitchell (Wed, 20 Feb 2019 18:11:06 GMT):
so that all nodes can keep an up to date model of the shape of the network

arsulegai (Wed, 20 Feb 2019 18:12:00 GMT):
A parallel chain concept?

jsmitchell (Wed, 20 Feb 2019 18:12:08 GMT):
no, it's a graph

jsmitchell (Wed, 20 Feb 2019 18:12:24 GMT):
and it's not validated like blocks or state

jsmitchell (Wed, 20 Feb 2019 18:12:43 GMT):
we'd just check signatures to make sure that bad nodes aren't lying about the shape of the network

jsmitchell (Wed, 20 Feb 2019 18:13:29 GMT):
when you receive a message from a peer, you'd look at the current shape of the network and determine the likelihood that your peers would already be receiving the message via an alternate path

jsmitchell (Wed, 20 Feb 2019 18:13:43 GMT):
in a fully connected network, this would always be a 100% probability

jsmitchell (Wed, 20 Feb 2019 18:13:59 GMT):
so, above some threshold, say 90%, you would not retransmit

arsulegai (Wed, 20 Feb 2019 18:14:04 GMT):
Ok, that would be too frequent I guess. For example in raft we'll end up having more node discovery messages along with consensus heartbeat messages.

jsmitchell (Wed, 20 Feb 2019 18:14:23 GMT):
well, you could decide the frequency of the topology annoucements

arsulegai (Wed, 20 Feb 2019 18:14:28 GMT):
It needs careful modelling

arsulegai (Wed, 20 Feb 2019 18:14:30 GMT):
Yeah

jsmitchell (Wed, 20 Feb 2019 18:14:34 GMT):
yes, it's complex

jsmitchell (Wed, 20 Feb 2019 18:15:50 GMT):
@amundson @ltseeley @pschwarz ^

ltseeley (Wed, 20 Feb 2019 18:17:18 GMT):
@jsmitchell is this a use case that's actually been discussed before?

jsmitchell (Wed, 20 Feb 2019 18:17:27 GMT):
in broad generalities

jsmitchell (Wed, 20 Feb 2019 18:18:00 GMT):
the behavior of the gossip layer is far from ideal, currently

jsmitchell (Wed, 20 Feb 2019 18:18:13 GMT):
we have certainly discussed stochastic broadcast before

ltseeley (Wed, 20 Feb 2019 18:18:29 GMT):
LIke is there some barrier that would prevent a consortium from setting up a realiably fully peered network?

ltseeley (Wed, 20 Feb 2019 18:18:29 GMT):
Like is there some barrier that would prevent a consortium from setting up a realiably fully peered network?

jsmitchell (Wed, 20 Feb 2019 18:18:38 GMT):
this represents more of a complete soultion to the problem

jsmitchell (Wed, 20 Feb 2019 18:18:53 GMT):
@ltseeley like firewalls? sure

jsmitchell (Wed, 20 Feb 2019 18:19:45 GMT):
there are going to be static intentional barriers (like firewall settings), and dynamic unintentional barriers (like nodes going offline, network segmentation, etc)

jsmitchell (Wed, 20 Feb 2019 18:20:33 GMT):
it would be really keen if the design of the gossip piece properly took those things into consideration and behaved optimally

jsmitchell (Wed, 20 Feb 2019 18:21:28 GMT):
i.e. for a fully connected network, recognize that the chance of delivery is 100% and not rebroadcast, but for a 'linked list' connectivity, the need is to rebroadcast 100% of the time

jsmitchell (Wed, 20 Feb 2019 18:21:54 GMT):
because the node recognizes that its rebroadcast is the _only_ way this peer is going to receive the message

ltseeley (Wed, 20 Feb 2019 18:24:01 GMT):
Well if a consortium is going to setup a network running PBFT, for instance, is there a reason they wouldn't be willing to configure the firewall to allow communication with all the other nodes? Is there some other factor that would prevent them from wanting to connect to some other organization's node(s)? And the "dynamic unintentional barriers" are supposed to be solved by the algorithm to some extent (Raft has CFT when up to 1/2 nodes are unavailable, PBFT when 1/3 are unavailable).

jsmitchell (Wed, 20 Feb 2019 18:24:35 GMT):
this is more a discussion about gossip doing the right thing than consensus

jsmitchell (Wed, 20 Feb 2019 18:24:55 GMT):
consensus kind of runs as a virtual layer on top of the gossip network

jsmitchell (Wed, 20 Feb 2019 18:25:04 GMT):
it is more efficient if it is fully connected

jsmitchell (Wed, 20 Feb 2019 18:25:10 GMT):
but, that is not a hard and fast requirement

ltseeley (Wed, 20 Feb 2019 18:25:31 GMT):
GOt it

ltseeley (Wed, 20 Feb 2019 18:25:31 GMT):
Got it

kodonnel (Wed, 20 Feb 2019 18:30:40 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=fvDDvFWjSas2WePJo) So the problem was self-created in the end. We disabled the lint related steps for the moment, in particular for sawtooth-poet. Which in there has an explicit build of poet-common (not installed) which the unit tests use. Caught us by surprise. So our fault, but I'm not in love with how tangled the builds, and build steps are. Just sayin' :grin:

jsmitchell (Wed, 20 Feb 2019 18:32:37 GMT):
@rbuysse ^

ltseeley (Wed, 20 Feb 2019 18:41:15 GMT):
So it sounds like a non-relaying broadcast call on the consensus API and consensus hints would both be incomplete solutions, because neither really take into consideration the actually topology of the network (they assume that the actual topology matches the expected topology). Am I understanding that correctly?

jsmitchell (Wed, 20 Feb 2019 18:44:59 GMT):
yes

jsmitchell (Wed, 20 Feb 2019 22:32:56 GMT):
@amundson @pschwarz thoughts on above discussion?

amundson (Thu, 21 Feb 2019 00:17:04 GMT):
a lot of that makes sense, in that you are suggesting the network layer should be good enough that consensus shouldn't need to worry about it

amundson (Thu, 21 Feb 2019 00:17:33 GMT):
but, I wonder if it's odd that we are doing any gossip activity at all with PBFT/Raft

jsmitchell (Thu, 21 Feb 2019 00:48:45 GMT):
Well, the argument would be that we wouldn’t be doing any gossip on a fully-connected network. Whether we would allow PBFT/Raft to run on a temporarily or permanently non-fully connected network seems like a somewhat different question.

jsmitchell (Thu, 21 Feb 2019 00:50:25 GMT):
But, it feels like something that can behave correctly in a fully connected case, and is resilient to temporary non fully connected states would be a nice benefit

manojgop (Thu, 21 Feb 2019 10:57:39 GMT):
@jsmitchell @amundson @ltseeley @pschwarz Can we have a temporary solution (eg: sawtooth settings ) introduced to avoid gossip message rebroadcast for a fully connected network like Raft. In case of Raft we observer queue full issue and batch rejections when we increased the TPS to 10 on a 5 node network. But with gossip msg rebroadcast disabled we are able to run with 10 TPS for more than 24 hours and its still running without any batch rejections. We are yet to verify the max TPS supported in Raft with gossip rebroadcast disabled. Meanwhile if you have some suggestions to improve the TPS for Raft let me know.

manojgop (Thu, 21 Feb 2019 10:57:39 GMT):
@jsmitchell @amundson @ltseeley @pschwarz Can we have a temporary solution (eg: sawtooth settings ) introduced to avoid gossip message rebroadcast for a fully connected network like Raft. In case of Raft we observe queue full issue and batch rejections when we increased the TPS to 10 on a 5 node network. But with gossip msg rebroadcast disabled we are able to run with 10 TPS for more than 24 hours and its still running without any batch rejections. We are yet to verify the max TPS supported in Raft with gossip rebroadcast disabled. Meanwhile if you have some suggestions to improve the TPS for Raft let me know.

manojgop (Thu, 21 Feb 2019 10:57:39 GMT):
@jsmitchell @amundson @ltseeley @pschwarz Can we have a temporary solution (eg: sawtooth settings ) introduced to avoid gossip message rebroadcast for a fully connected network like Raft. In case of Raft we observe queue full issue and batch rejections when we increased the TPS to 10 on a 5 node network. But with gossip msg rebroadcast disabled we are able to run with 10 TPS for more than 24 hours and its still running without any batch rejections. I'm yet to verify the max TPS supported in Raft with gossip rebroadcast disabled. Meanwhile if you have some suggestions to improve the TPS for Raft let me know.

manojgop (Thu, 21 Feb 2019 11:16:36 GMT):
I saw a setting "sawtooth.gossip.time_to_live" in gossip.py. Is it possible to set this to 0 to avoid gossip rebroadcast ?

manojgop (Thu, 21 Feb 2019 11:16:36 GMT):
I saw a setting "sawtooth.gossip.time_to_live" in https://github.com/hyperledger/sawtooth-core/blob/1a1e7eb19c32d00bcadb49d5bdfac939c323d852/validator/sawtooth_validator/gossip/gossip.py#L272. Is it possible to set this to 0 to avoid gossip rebroadcast ?

manojgop (Thu, 21 Feb 2019 11:16:36 GMT):
I saw a setting "sawtooth.gossip.time_to_live" in https://github.com/hyperledger/sawtooth-core/blob/1a1e7eb19c32d00bcadb49d5bdfac939c323d852/validator/sawtooth_validator/gossip/gossip.py#L272. \\ Is it possible to set this to 0 to avoid gossip rebroadcast ?

manojgop (Thu, 21 Feb 2019 11:16:36 GMT):
I saw a setting "sawtooth.gossip.time_to_live" in https://github.com/hyperledger/sawtooth-core/blob/1a1e7eb19c32d00bcadb49d5bdfac939c323d852/validator/sawtooth_validator/gossip/gossip.py#L272. \Is it possible to set this to 0 to avoid gossip rebroadcast ? \

manojgop (Thu, 21 Feb 2019 11:16:36 GMT):
I saw a setting "sawtooth.gossip.time_to_live" in https://github.com/hyperledger/sawtooth-core/blob/1a1e7eb19c32d00bcadb49d5bdfac939c323d852/validator/sawtooth_validator/gossip/gossip.py#L272. \\ Is it possible to set this to 0 to avoid gossip rebroadcast ? \

manojgop (Thu, 21 Feb 2019 11:16:36 GMT):
I saw a setting "sawtooth.gossip.time_to_live" in https://github.com/hyperledger/sawtooth-core/blob/1a1e7eb19c32d00bcadb49d5bdfac939c323d852/validator/sawtooth_validator/gossip/gossip.py#L272. \newline Is it possible to set this to 0 to avoid gossip rebroadcast ? \

manojgop (Thu, 21 Feb 2019 11:16:36 GMT):
I saw a setting "sawtooth.gossip.time_to_live" in https://github.com/hyperledger/sawtooth-core/blob/1a1e7eb19c32d00bcadb49d5bdfac939c323d852/validator/sawtooth_validator/gossip/gossip.py#L272. \newline Is it possible to set this to 0 to avoid gossip rebroadcast ? \

manojgop (Thu, 21 Feb 2019 11:16:36 GMT):
I saw a setting "sawtooth.gossip.time_to_live" in https://github.com/hyperledger/sawtooth-core/blob/1a1e7eb19c32d00bcadb49d5bdfac939c323d852/validator/sawtooth_validator/gossip/gossip.py#L272. _\newline_ Is it possible to set this to 0 to avoid gossip rebroadcast ? \

manojgop (Thu, 21 Feb 2019 11:16:36 GMT):
I saw a setting "sawtooth.gossip.time_to_live" in https://github.com/hyperledger/sawtooth-core/blob/1a1e7eb19c32d00bcadb49d5bdfac939c323d852/validator/sawtooth_validator/gossip/gossip.py#L272. ``` ``` Is it possible to set this to 0 to avoid gossip rebroadcast ?

manojgop (Thu, 21 Feb 2019 11:16:36 GMT):
I saw a setting "sawtooth.gossip.time_to_live" in https://github.com/hyperledger/sawtooth-core/blob/1a1e7eb19c32d00bcadb49d5bdfac939c323d852/validator/sawtooth_validator/gossip/gossip.py#L272. Is it possible to set this to 0 to avoid gossip rebroadcast ?

manojgop (Thu, 21 Feb 2019 11:16:36 GMT):
I saw a setting "sawtooth.gossip.time_to_live" in https://github.com/hyperledger/sawtooth-core/blob/1a1e7eb19c32d00bcadb49d5bdfac939c323d852/validator/sawtooth_validator/gossip/gossip.py#L272. Is it possible to set this to 0 to avoid gossip rebroadcast ?

jsmitchell (Thu, 21 Feb 2019 16:17:51 GMT):
I don't think that's going to avoid it. You can comment out the broadcast handler for local testing. I doubt we would entertain a band-aid PR.

arsulegai (Thu, 21 Feb 2019 16:28:15 GMT):
But as per code it appears to be that setting this value to 1 or 0 would stop re-broadcast. What was the other reason to have this setting?

arsulegai (Thu, 21 Feb 2019 16:28:15 GMT):
But as per code it appears to be that setting this value to a value lesser than or equal to 1 would stop re-broadcast. What was the other reason to have this setting?

jsmitchell (Thu, 21 Feb 2019 16:36:05 GMT):
i mean, go ahead and give it a try

arsulegai (Thu, 21 Feb 2019 16:42:37 GMT):
Yeah, in current run I've commented out block and batch re-broadcast. Gossip messages of consensus was just being notified to engine even though it's handled by broadcast handler, so didn't touch that part.

pschwarz (Thu, 21 Feb 2019 16:48:53 GMT):
So, really, it sounds like we should be implementing a stochastic gossip layer, which

jsmitchell (Thu, 21 Feb 2019 16:49:13 GMT):
yeah, i raised that with @ltseeley, @arsulegai. The consensus message handling in there is totally inappropriate.

ltseeley (Thu, 21 Feb 2019 16:49:45 GMT):
There's a PR to update it. Can you take a look, @jsmitchell? https://github.com/hyperledger/sawtooth-core/pull/2019

ltseeley (Thu, 21 Feb 2019 16:52:42 GMT):
@arsulegai @manojgop seems to me like setting TTL to 1 should do what we want it to. If you try it out, let me know how it goes.

danintel (Thu, 21 Feb 2019 16:53:39 GMT):
What's the minimum number of nodes for Raft to behave in a sane way? 3? 2? 1?

ltseeley (Thu, 21 Feb 2019 16:54:44 GMT):
iirc it should work with 1

ltseeley (Thu, 21 Feb 2019 16:55:07 GMT):
Just elects itself

jsmitchell (Thu, 21 Feb 2019 16:55:11 GMT):
left a couple of comments @ltseeley

ltseeley (Thu, 21 Feb 2019 17:09:08 GMT):
Responded

neewy (Mon, 25 Feb 2019 11:03:05 GMT):
Can someone please explain how blockpublisher is selected? How candidate blocks are composed? Thanks

arsulegai (Mon, 25 Feb 2019 13:29:56 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=G76PfvnWnQ7T3ZvaW) @ltseeley It doesn't work. Probably @jsmitchell had foreseen this. Consensus messages will get ignored even with fully connected network. Your pending PR needs to go in, let's plan for it. Immediate next step could be to bring in separate message type for consensus gossip broadcasting, so that priority can be set when processing them.

arsulegai (Mon, 25 Feb 2019 13:31:42 GMT):
@neewy I didn't understand what you mean by how block publisher is selected. Candidate block is composed when consensus engine requests validator node to create the block.

ltseeley (Mon, 25 Feb 2019 14:10:18 GMT):
@arsulegai did you try setting it to 1, or 0?

arsulegai (Mon, 25 Feb 2019 14:19:11 GMT):
In either case, consensus message isn't passed to consensus engine

ltseeley (Mon, 25 Feb 2019 15:05:13 GMT):
Ah yes, I see. With that PR, it should work.

arsulegai (Mon, 25 Feb 2019 16:25:18 GMT):
PR looks ok to me - aim is to have separate handler, other discussions in that review can be in their own PRs.

cliveb (Mon, 25 Feb 2019 18:35:46 GMT):
Has joined the channel.

rjones (Mon, 25 Feb 2019 19:22:10 GMT):
Has joined the channel.

rjones (Mon, 25 Feb 2019 19:23:00 GMT):
Hi! The internship program has been extended a week, and we don't have any Sawtooth applications. If you want some free-ish developers: https://wiki.hyperledger.org/display/INTERN/2019+Projects

agoops (Tue, 26 Feb 2019 00:06:45 GMT):
Hello, I am curious about this block of code in `SyncBlockPublisher.on_chain_updated`: https://github.com/hyperledger/sawtooth-core/blob/c65462ea57e07671d5eb217fb3fa5a68057ed18d/validator/src/journal/publisher.rs#L185-L189

agoops (Tue, 26 Feb 2019 00:10:48 GMT):
Hello, I am curious about this block of code in `SyncBlockPublisher.on_chain_updated`: https://github.com/hyperledger/sawtooth-core/blob/c65462ea57e07671d5eb217fb3fa5a68057ed18d/validator/src/journal/publisher.rs#L185-L189 It seems that it checks if there was a `candidate_block` already being built, then it cancels that block, and calls `self.initialize_block` to begin building a new candidate_block. However, it builds this "new" candidate block off of the same `previous_block_id` that the deleted candidate_block was being built off of. Why is this the case? I would assume that since `on_chain_updated` is called with a `chain_head`, any subsequent building of candidate_blocks should be *off the new chain_head*. Any help in understanding this piece would be greatly appreciated.

agoops (Tue, 26 Feb 2019 00:10:48 GMT):
Hello, I am curious about this block of code in `SyncBlockPublisher.on_chain_updated`: https://github.com/hyperledger/sawtooth-core/blob/c65462ea57e07671d5eb217fb3fa5a68057ed18d/validator/src/journal/publisher.rs#L185-L189 It seems that it checks if there was a `candidate_block` already being built, then it cancels that block, and calls `self.initialize_block` to begin building a new candidate_block. However, it builds this "new" candidate block off of the same `previous_block_id` that the deleted candidate_block was being built off of. Why is this the case? I would assume that since `on_chain_updated` is called with a `chain_head`, any subsequent building of candidate_blocks should be *off the new chain_head*. Any help in understanding this piece, and why it isn't built off the given chain_head, would be greatly appreciated.

agoops (Tue, 26 Feb 2019 00:10:48 GMT):
Hello, I am curious about this block of code in `SyncBlockPublisher.on_chain_updated`: https://github.com/hyperledger/sawtooth-core/blob/c65462ea57e07671d5eb217fb3fa5a68057ed18d/validator/src/journal/publisher.rs#L185-L189 It seems that it checks if there was a `candidate_block` already being built, then it cancels that block, and calls `self.initialize_block` to begin building a new candidate_block. However, it builds this "new" candidate block off of the same `previous_block_id` that the deleted candidate_block was being built off of. Why is this the case? I would assume that since `on_chain_updated` is called with a `chain_head`, any subsequent building of candidate_blocks should be *off the new chain_head*. Any help in understanding this piece, and why it isn't built off the given chain_head, would be greatly appreciated.

agoops (Tue, 26 Feb 2019 00:10:48 GMT):
Hello, I am interested in learning about how blocks become committed in sawtooth-core, and how that affects the building of a new candidate block. I had a handful of questions, which I hope I could get some help with! I am curious about this block of code in `SyncBlockPublisher.on_chain_updated`: https://github.com/hyperledger/sawtooth-core/blob/c65462ea57e07671d5eb217fb3fa5a68057ed18d/validator/src/journal/publisher.rs#L185-L189 It seems that it checks if there was a `candidate_block` already being built, then it cancels that block, and calls `self.initialize_block` to begin building a new candidate_block. However, it builds this "new" candidate block off of the same `previous_block_id` that the deleted candidate_block was being built off of. Why is this the case? I would assume that since `on_chain_updated` is called with a `chain_head`, any subsequent building of candidate_blocks should be off the new chain_head. Are there reasons that a validator may keep building off of the previous block like that? Any help in understanding this piece would be greatly appreciated!

Lotus (Tue, 26 Feb 2019 14:57:57 GMT):
Has joined the channel.

arsulegai (Wed, 27 Feb 2019 03:10:20 GMT):
@amundson @mfford I am interested in sawtooth-core contributor meetings, could you please let us know when are we planning for it?

mfford (Wed, 27 Feb 2019 15:35:40 GMT):
@arsulegai Thanks for checking in on this. We will communicate scheduling details to the whole community once they are firm, so stay tuned!

agoops (Wed, 27 Feb 2019 17:54:53 GMT):
Hi @jsmitchell , I am curious if you can point me in the right direction regarding my question^. Is there a better channel to post questions like this?

wkatsak (Thu, 28 Feb 2019 18:24:22 GMT):
Has joined the channel.

wkatsak (Thu, 28 Feb 2019 18:24:32 GMT):
Hello everyone

wkatsak (Thu, 28 Feb 2019 18:24:43 GMT):
i'm finding myself in need of building debs for 1.1.4

wkatsak (Thu, 28 Feb 2019 18:24:51 GMT):
(or, insert version here)

wkatsak (Thu, 28 Feb 2019 18:25:02 GMT):
whats the quickest way to do this?

wkatsak (Thu, 28 Feb 2019 18:25:14 GMT):
im looking at BUILD.md

wkatsak (Thu, 28 Feb 2019 18:25:24 GMT):
but this is giving me a version mismatch error if i check out 1.1.4

wkatsak (Thu, 28 Feb 2019 18:44:00 GMT):
Any help would be greatly appreciated

amundson (Thu, 28 Feb 2019 18:59:32 GMT):
@wkatsak we recommend you use the published debs - any reason that isn't working for you?

jsmitchell (Thu, 28 Feb 2019 20:11:21 GMT):
@agoops huh, that does look a little weird. @pschwarz what do you think? (about one screen up)

pschwarz (Thu, 28 Feb 2019 20:56:47 GMT):
The main reason why it builds off the previously used `previous_block_id` is that the consensus engine explicitly tells the validator what block to build a new block on top of. The commit message will most likely be paired with a cancel block message, but it still needs to handle the case where the publishing is in progress. It cancels and restarts the block in order to remove any batches that may have been included in the committed block.

wkatsak (Thu, 28 Feb 2019 21:21:40 GMT):
@amundson we need to make some small patches and want to be able to deploy to test, without doing packaging manually

wkatsak (Thu, 28 Feb 2019 21:22:06 GMT):
@amundson In particular, ZMQ disables ipv6 support by default. I think we've found a workaround, but we need to patch.

amundson (Thu, 28 Feb 2019 21:39:02 GMT):
@rbuysse ^

rbuysse (Thu, 28 Feb 2019 21:40:19 GMT):
@wkatsak Can you post the exact error you're getting?

wkatsak (Thu, 28 Feb 2019 21:42:16 GMT):
I was running `docker build -f validator/Dockerfile-installed .`

wkatsak (Thu, 28 Feb 2019 21:44:11 GMT):
"VERSION file and (bumped?) git describe versions differ:"

rbuysse (Thu, 28 Feb 2019 21:45:40 GMT):
what do you see if you run `git tag`

wkatsak (Thu, 28 Feb 2019 21:45:56 GMT):
`v0.6.0 v0.6.1 v0.7.0 v0.7.1 v0.8.0 v0.8.1 v0.8.10 v0.8.11 v0.8.12 v0.8.13 v0.8.2 v0.8.3 v0.8.4 v0.8.5 v0.8.6 v0.8.7 v0.8.8 v0.8.9 v0.9.0 v1.0.0 v1.0.0rc1 v1.0.0rc2 v1.0.0rc3 v1.0.0rc4 v1.0.0rc5 v1.0.0rc6 v1.0.0rc7 v1.0.1 v1.0.2 v1.0.3 v1.0.4 v1.0.5 v1.1.0 v1.1.1 v1.1.2 v1.1.3 v1.1.4 v1.2.0`

wkatsak (Thu, 28 Feb 2019 21:46:20 GMT):
interestingly, i was futzing with my forked branch, and now i can't reproduce it

wkatsak (Thu, 28 Feb 2019 21:46:21 GMT):
lol

wkatsak (Thu, 28 Feb 2019 21:46:37 GMT):
maybe you can explain how the build works. what does the VERSION envvar passed to that container do?

wkatsak (Thu, 28 Feb 2019 21:46:43 GMT):
is that just for tagging?

wkatsak (Thu, 28 Feb 2019 21:46:50 GMT):
or does it check out the code at some tag or branch?

rbuysse (Thu, 28 Feb 2019 21:47:26 GMT):
that's just for versioning the build artifacts

wkatsak (Thu, 28 Feb 2019 21:47:57 GMT):
so, it will build whatever is checked out?

rbuysse (Thu, 28 Feb 2019 21:48:01 GMT):
the dockerfile runs ./bin/get_version so the .debs can be versioned properly

rbuysse (Thu, 28 Feb 2019 21:48:03 GMT):
yeah

wkatsak (Thu, 28 Feb 2019 21:50:01 GMT):
ah ok, thanks, that helps

wkatsak (Thu, 28 Feb 2019 21:50:11 GMT):
and then, you just extract the debs from the container image?

rbuysse (Thu, 28 Feb 2019 21:50:52 GMT):
from sawtooth-core/ run `docker-compose -f docker-compose-installed.yaml build validator`

kodonnel (Thu, 28 Feb 2019 21:51:28 GMT):
would be nice if that was a little more flexible actually. the AUTO_STRICT deep down in there can be a bit of a pain, don't get me wrong I like it over all

rbuysse (Thu, 28 Feb 2019 21:51:49 GMT):
then run `docker-compose -f docker/compose/copy-debs.yaml up validator`

rbuysse (Thu, 28 Feb 2019 21:52:44 GMT):
@kodonnel we've been talking about ways to improve it

kodonnel (Thu, 28 Feb 2019 21:53:16 GMT):
don't know if I have any better ideas, really, but I've been bitten by it in the past alot.

rbuysse (Thu, 28 Feb 2019 21:53:46 GMT):
@wkatsak after you run copy-debs, your deb file will be in build/debs

wkatsak (Thu, 28 Feb 2019 22:02:15 GMT):
who runs copy-debs?

wkatsak (Thu, 28 Feb 2019 22:02:21 GMT):
i can run inside the container?

kodonnel (Thu, 28 Feb 2019 22:14:23 GMT):
@rbuysse only ideas I've had on it, aren't particularly compatible with the rust/cargo semantic versioning which is fairly strict. It's all pretty smooth if you are working at the leading version, but if you experimenting with HEAD-1 then it gets tricky. Actually not tricky so much as you find out you made the mistake an hour after you made it. an early cheap gating check before the heart of the build would probably go a long way reduce frustration a bit, and a little guide maybe

wkatsak (Thu, 28 Feb 2019 22:24:58 GMT):
@rbuysse Whats the right way to run copy-debs?

wkatsak (Thu, 28 Feb 2019 23:53:44 GMT):
@rbuysse Sorry, my mind glossed over what you copy-pasted

wkatsak (Thu, 28 Feb 2019 23:53:48 GMT):
looks like i got it working

rbuysse (Thu, 28 Feb 2019 23:58:27 GMT):
awesome!

wkatsak (Fri, 01 Mar 2019 00:00:25 GMT):
We are trying to patch to enable IPv6

wkatsak (Fri, 01 Mar 2019 00:00:30 GMT):
we have a pure v6 testnet

wkatsak (Fri, 01 Mar 2019 00:01:06 GMT):
I don't think there is any fundamental problem, just some "bad" assumptions made in certain places

amundson (Fri, 01 Mar 2019 14:30:03 GMT):
@wkatsak cool

danintel (Fri, 01 Mar 2019 16:23:56 GMT):
@wkatsak What did you have to do to enable IPv6? Patch and rebuild ZMQ? Or was it some simpler?

wkatsak (Fri, 01 Mar 2019 19:05:35 GMT):
@danintel We are still working on bringing it up, but so far I've patched the validator socket setup to call setsockopts with the ZMQ ipv6 flag

wkatsak (Fri, 01 Mar 2019 19:05:58 GMT):
@danintel and in the rest API, the code that parses the config breaks on an address like [::]

wkatsak (Fri, 01 Mar 2019 19:06:02 GMT):
we patched that as well

danintel (Fri, 01 Mar 2019 19:08:41 GMT):
@wkatsak Maybe you can file a PR or at least a JIRA ticket with your exact changes when you are done.

wkatsak (Fri, 01 Mar 2019 19:08:58 GMT):
Yes, once I have it all working, we will find some way to get it back

Quasso (Wed, 06 Mar 2019 12:04:25 GMT):
Has joined the channel.

kodonnel (Wed, 06 Mar 2019 15:01:30 GMT):
Looks like something has changed in the rustup-init script `Removing intermediate container 8099f9266063 ---> 46a362b29524 Step 7/15 : RUN curl https://sh.rustup.rs -sSf > /usr/bin/rustup-init && chmod +x /usr/bin/rustup-init && rustup-init -y ---> Running in cc48bc90b286 /usr/bin/rustup-init: line 18: RUSTUP_UPDATE_ROOT: unbound variable Service 'validator' failed to build: The command '/bin/sh -c curl https://sh.rustup.rs -sSf > /usr/bin/rustup-init && chmod +x /usr/bin/rustup-init && rustup-init -y' returned a non-zero code: 1 P`

kodonnel (Wed, 06 Mar 2019 15:01:30 GMT):
Looks like something has changed in the rustup-init script `Removing intermediate container 8099f9266063 ---> 46a362b29524 Step 7/15 : RUN curl https://sh.rustup.rs -sSf > /usr/bin/rustup-init && chmod +x /usr/bin/rustup-init && rustup-init -y ---> Running in cc48bc90b286 /usr/bin/rustup-init: line 18: RUSTUP_UPDATE_ROOT: unbound variable Service 'validator' failed to build: The command '/bin/sh -c curl https://sh.rustup.rs -sSf > /usr/bin/rustup-init && chmod +x /usr/bin/rustup-init && rustup-init -y' returned a non-zero code: 1 P`

kodonnel (Wed, 06 Mar 2019 15:01:30 GMT):
Looks like something has changed in the rustup-init script ``` ``` `Removing intermediate container 8099f9266063 ---> 46a362b29524 Step 7/15 : RUN curl https://sh.rustup.rs -sSf > /usr/bin/rustup-init && chmod +x /usr/bin/rustup-init && rustup-init -y ---> Running in cc48bc90b286 /usr/bin/rustup-init: line 18: RUSTUP_UPDATE_ROOT: unbound variable Service 'validator' failed to build: The command '/bin/sh -c curl https://sh.rustup.rs -sSf > /usr/bin/rustup-init && chmod +x /usr/bin/rustup-init && rustup-init -y' returned a non-zero code: 1 P`

kodonnel (Wed, 06 Mar 2019 15:01:30 GMT):
Looks like something has changed in the rustup-init script ``` Removing intermediate container 8099f9266063 ---> 46a362b29524 Step 7/15 : RUN curl https://sh.rustup.rs -sSf > /usr/bin/rustup-init && chmod +x /usr/bin/rustup-init && rustup-init -y ---> Running in cc48bc90b286 /usr/bin/rustup-init: line 18: RUSTUP_UPDATE_ROOT: unbound variable Service 'validator' failed to build: The command '/bin/sh -c curl https://sh.rustup.rs -sSf > /usr/bin/rustup-init && chmod +x /usr/bin/rustup-init && rustup-init -y' returned a non-zero code: 1 ```

kodonnel (Wed, 06 Mar 2019 15:03:00 GMT):
getting that in several of the project builds today

kodonnel (Wed, 06 Mar 2019 15:12:10 GMT):
weird, I don't see why that error should occur based on the script, also don't see how any of last nights rustup changes should cause that.

kodonnel (Wed, 06 Mar 2019 15:13:40 GMT):
oh i see it now

kodonnel (Wed, 06 Mar 2019 15:21:17 GMT):
https://github.com/rust-lang/rustup.rs/issues/1684

satelander (Wed, 06 Mar 2019 15:21:50 GMT):
Has joined the channel.

agunde (Wed, 06 Mar 2019 15:24:24 GMT):
@kodonnel Thanks for looking into this, we were just trying to figure out what was happening as well.

agunde (Wed, 06 Mar 2019 15:25:31 GMT):
There is a PR up to fix it now https://github.com/rust-lang/rustup.rs/pull/1683/files

kodonnel (Wed, 06 Mar 2019 15:42:39 GMT):
there's an interim fix up and live now.

pankajcheema (Thu, 07 Mar 2019 05:50:14 GMT):
Has joined the channel.

Keegan-Lee (Thu, 07 Mar 2019 19:48:16 GMT):
Has joined the channel.

rohitkhatri (Fri, 08 Mar 2019 10:46:39 GMT):
Has left the channel.

MarcVauclairNXP (Tue, 12 Mar 2019 13:59:03 GMT):
Has joined the channel.

MarcVauclairNXP (Tue, 12 Mar 2019 13:59:19 GMT):
Hello, does anyone know how to reach the Hyperledger Sawtooth documentation maintainer? Goal: reporting improvements to the documentation

arsulegai (Tue, 12 Mar 2019 15:01:51 GMT):
^ @achenette

achenette (Tue, 12 Mar 2019 15:05:06 GMT):
Thanks, arsulegai! @MarcVauclairNXP - I work on the documentation for Hyperledger Sawtooth (and other projects). I would be happy to see your suggestions, in this channel or #sawtooth or in a direct message.

MarcVauclairNXP (Tue, 12 Mar 2019 15:14:24 GMT):
Thank you both. I have send my suggestions in a direct message to you @achenette .

Dan (Thu, 14 Mar 2019 02:50:54 GMT):
Mic is looking for support on the identity whitepaper. Can someone familiar with Pike drop a line or two in there? https://docs.google.com/document/d/10D0WgbMV91YBPzKTutc5TNirDC1RRzB_8GSF84hv4l4/edit#

Dan (Thu, 14 Mar 2019 02:51:18 GMT):
(search for Pike in the doc to take you to a place to put it.)

arsulegai (Fri, 15 Mar 2019 15:12:16 GMT):
Ever seen such traces? validator-0: [32m[2019-03-14 15:21:45.659 INFO interconnect][0m [37mNo response from OutboundConnectionThread-tcp://validator-1:8800 in 1552569580.1803834 seconds - removing connection.

arsulegai (Fri, 15 Mar 2019 15:12:16 GMT):
Have you seen such traces? validator-0: [32m[2019-03-14 15:21:45.659 INFO interconnect][0m [37mNo response from OutboundConnectionThread-tcp://validator-1:8800 in 1552569580.1803834 seconds - removing connection.

arsulegai (Fri, 15 Mar 2019 15:12:35 GMT):
Interesting part is the time in seconds printed in the log

arsulegai (Fri, 15 Mar 2019 18:59:57 GMT):
^ @danintel

mgkm (Fri, 15 Mar 2019 21:27:40 GMT):
Has joined the channel.

danintel (Fri, 15 Mar 2019 23:29:58 GMT):
No I have not seen that error. It seems like a network connectivity issue, with garbage values in the error message.

arsulegai (Sat, 16 Mar 2019 03:20:45 GMT):
Connection was removed in 2 sec. So I think variables getting garbage value is a problem

arsulegai (Sun, 17 Mar 2019 04:57:10 GMT):
^ @pschwarz @agunde @ltseeley Have you seen such cases? Can this be an issue?

amolk (Mon, 18 Mar 2019 09:58:49 GMT):
Is there a Sawtooth contributors' meeting this week?

pschwarz (Mon, 18 Mar 2019 13:25:27 GMT):
@arsulegai I haven't seen that particular message. There might have been some incorrect math, by the look of it. Perhaps the original time marked was lost (and therefore `0`).

arsulegai (Mon, 18 Mar 2019 14:40:21 GMT):
Thanks @pschwarz the reason for this is unknown yet, it didn't appear again. But it looks to be serious if for some reason time value was marked as 0.

arsulegai (Mon, 18 Mar 2019 15:16:17 GMT):
@jsmitchell @amundson There was a discussion on periodic execution pending batches and rebroadcast, if batches for some reason do not get added to a block even after waiting for long. Do you consider this a feature going in for the next Sawtooth 1.2 release?

amundson (Mon, 18 Mar 2019 15:48:40 GMT):
@amolk next week I think - today was the Grid contributor meeting

amundson (Mon, 18 Mar 2019 15:57:54 GMT):
@arsulegai As long as it is stable w/PBFT and we do not have any obvious regressions, I think we should proceed with the release. Currently we know there are some regressions that need to be address (re: PoET liveness tests).

amundson (Mon, 18 Mar 2019 15:58:36 GMT):
We could pick a feature cut-off date, say the end of the month, to get in any new features.

mfford (Mon, 18 Mar 2019 16:11:15 GMT):
@amolk The HL Sawtooth Contributor Meeting was just added to the HL Community Calendar for Monday, March 25th: https://wiki.hyperledger.org/display/HYP/Calendar+of+Public+Meetings

arsulegai (Tue, 19 Mar 2019 14:32:15 GMT):
Thanks @amundson , there are few other issues in validator such as instability of validator connections due to "maximum-peer-connectivity"

arsulegai (Tue, 19 Mar 2019 14:32:43 GMT):
Solving them could be beneficial for PBFT as well

amundson (Tue, 19 Mar 2019 14:36:12 GMT):
@arsulegai Are you suggesting that there is a regression since 1.1 for that?

arsulegai (Tue, 19 Mar 2019 15:47:03 GMT):
I don't know if it's in 1.1 as well, at least I find that issue in latest master

amundson (Tue, 19 Mar 2019 17:09:56 GMT):
The poet2 master branch has lint errors because of a change in pylint.

Nonj (Thu, 21 Mar 2019 17:18:21 GMT):
Has joined the channel.

Nonj (Thu, 21 Mar 2019 17:20:50 GMT):
Hi folks, I'm currently on the sawtooth/NEXT project and we are looking to upgrade our sawtooth_sdk to version 1.1.4 from 1.0.5. The issue I'm currently running into is that I see that the protobuf package in 1.1.4 got replaced by a package called "consensus" in 1.0.5. Can anyone confirm that? Or am I missing something crucial?

amundson (Thu, 21 Mar 2019 19:09:07 GMT):
@nonj when you say 'package', do you mean in terms of python or ubuntu?

amundson (Thu, 21 Mar 2019 19:09:07 GMT):
@Nonj when you say 'package', do you mean in terms of python or ubuntu?

Nonj (Thu, 21 Mar 2019 19:32:27 GMT):
sorry, python

Nonj (Thu, 21 Mar 2019 21:39:07 GMT):
I also just checked, the python package seems to come with sawtooth_sdk in v1.1.3

amundson (Fri, 22 Mar 2019 03:56:35 GMT):
We need to solve this proxy support another way than putting it within the Dockerfiles, or drop support for proxies. An ideal solution would require no support within a Dockerfile. Maybe folks behind a proxy should just use VMs instead of docker. Thoughts?

amundson (Fri, 22 Mar 2019 03:58:49 GMT):
@Nonj it may not be intentional. will require some research.

st (Fri, 22 Mar 2019 08:39:23 GMT):
Has left the channel.

mfford (Fri, 22 Mar 2019 14:13:26 GMT):
Another reminder that our first Hyperledger Sawtooth Contributor Meeting is Monday, March 25th at 10am CDT. The meeting information can be found on the Hyperledger Community Meetings Calendar located here: https://wiki.hyperledger.org/display/HYP/Calendar+of+Public+Meetings You can still add topics to the agenda for this meeting. If you have an appropriate topic you would like to discuss and facilitate, please add it to the agenda, located in the wiki here: https://wiki.hyperledger.org/pages/viewpage.action?pageId=6427754 Looking forward to seeing everyone at our first meeting! -Mark

kodonnel (Tue, 26 Mar 2019 14:38:01 GMT):
Have we considered moving the basic protobuf definitions out of the SDK packages? Seems like having copies of that all over the various repositories is asking for trouble.

amundson (Tue, 26 Mar 2019 18:46:17 GMT):
@kodonnel the separation is intentional. the contract between the components is the serialized format, not the proto definition. if we rename or add something to the proto definition in core, the SDKs and ohter components can adopt it on their own timeline (often requiring code changes and/or support for the updates).

amundson (Tue, 26 Mar 2019 18:47:44 GMT):
that said, I think we need to remove use of the protobuf objects from our APIs entirely so it's not leaking out (and so that we can provide better (more natural) APIs than the protobuf APIs provide)

amundson (Tue, 26 Mar 2019 18:47:59 GMT):
in other words, make it an implementation detail

kodonnel (Tue, 26 Mar 2019 18:52:01 GMT):
I'd second that last. But to your point about serialized format not proto - is the format formally defined/documented anywhere apart from the protobufs?

amundson (Tue, 26 Mar 2019 18:53:50 GMT):
the protobufs are a formal definition. the serialization format is defined by protobuf.

kodonnel (Tue, 26 Mar 2019 18:58:33 GMT):
OK so then that's legal splitting hairs then. Still seems like it would be not a terrible idea to split that definition out. I get the individual sdk's and what have you should be free to move along that timeline according to their schedule. But you've effectively got a dependency there which isn't explicit, since it is just expressed by whether or not the most recent protobufs were copied into source or not. Maybe not "just" but significantly.

kodonnel (Tue, 26 Mar 2019 18:58:33 GMT):
OK so then that's legal splitting hairs then.:) Still seems like it would be not a terrible idea to split that definition out. I get the individual sdk's and what have you should be free to move along that timeline according to their schedule. But you've effectively got a dependency there which isn't explicit, since it is just expressed by whether or not the most recent protobufs were copied into source or not. Maybe not "just" but significantly.

amundson (Tue, 26 Mar 2019 19:06:43 GMT):
I'm not sure what you are suggesting in a concrete form. Maybe pick a small subset and create the definition you have in mind.

kodonnel (Tue, 26 Mar 2019 19:20:45 GMT):
what I'm thinking about is not custom TP's or any TP specific protobufs, but the core protos (CLIENT_BATCH_*, CONSENSUS_*, TP_*, etc ). So what's in, for example : https://github.com/hyperledger/sawtooth-core/tree/master/protos, https://github.com/hyperledger/sawtooth-sdk-java/tree/master/protos, https://github.com/hyperledger/sawtooth-sdk-rust/tree/master/protos, et al. The latter two are copies of the first (at some point in time). Wouldn't it be better and more explicit if the protos were split out and maintained separately? and then tagged versions pulled in by the SDK's at build time. Otherwise seems like there is a an opening there to get unintentional drift.

kodonnel (Tue, 26 Mar 2019 19:21:57 GMT):
It's a theoretical problem, I'll grant you. But I've seen stuff go haywire like that before.

arsulegai (Tue, 26 Mar 2019 19:46:31 GMT):
Topic: Discussion on how to handle pending batches queue

arsulegai (Tue, 26 Mar 2019 19:47:59 GMT):
We've discussed few solutions where we talked about re-broadcasting pending transactions to other validators if it's waiting for too long in the queue

amundson (Tue, 26 Mar 2019 19:48:11 GMT):
@kodonnel no, for the reasons I said above

arsulegai (Tue, 26 Mar 2019 19:48:24 GMT):
And probably execute it once, before re-broadcasting to other nodes

arsulegai (Tue, 26 Mar 2019 19:49:28 GMT):
How about another proposal where all batches including the failed ones be part of block getting constructed?

amundson (Tue, 26 Mar 2019 19:49:40 GMT):
Another option would be to expire it and let the client re-submit

arsulegai (Tue, 26 Mar 2019 19:50:47 GMT):
But problem with such approaches is that each validator has it's own timer to expire, from client's point of view it's difficult to know which of the validator to trust

amundson (Tue, 26 Mar 2019 19:52:17 GMT):
I don't see how the validator vs. client resubmitting changes that trust.

kodonnel (Tue, 26 Mar 2019 19:53:11 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=w6WREshp746hQzK9a) @amundson Perhaps I'm being thick, but I'm not seeing how that suggestion conflicts with the intent of those reasons.

kodonnel (Tue, 26 Mar 2019 19:53:11 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=w6WREshp746hQzK9a) @amundson Perhaps I'm being thick, but I'm not seeing how the suggestion conflicts with the intent of those reasons.

amundson (Tue, 26 Mar 2019 19:54:23 GMT):
Making failed transactions part of the block would allow for some additional denial of service attacks which we would then have to mitigate. Currently failed transactions increase CPU on the validator but don't cause additional network traffic.

arsulegai (Tue, 26 Mar 2019 19:55:32 GMT):
Let's say one of the validator asked client to retry - but before the client could retry another validator committed it into the block.

amundson (Tue, 26 Mar 2019 19:56:18 GMT):
@kodonnel the tagging would help mitigate it, but there is no point since avoiding the duplication is optimizing for problem that doesn't exist. the real problems we need to solve is reducing dependencies, not adding them.

amundson (Tue, 26 Mar 2019 19:57:01 GMT):
it's already possible to pull those files from tagged versions of core, if you like

arsulegai (Tue, 26 Mar 2019 19:57:24 GMT):
hmm! I agree in cases like PoET there will be increased network because each validator would have removed these invalid transactions from their queue. Probably there's a way to mitigate it by storing invalid transaction status in txn receipts?

kodonnel (Tue, 26 Mar 2019 19:58:03 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=tLEa9noq5Nqr5YH3K) @amundson Honestly that's actually more palatable than the way the SDK's keep copies in source.

kodonnel (Tue, 26 Mar 2019 19:58:58 GMT):
there should be one source of truth for that serialization format.

amundson (Tue, 26 Mar 2019 19:59:10 GMT):
@arsulegai well, it is a reasonable thing to send receipts along with the block, IMO. but I don't think we should do invalid receipts along with it. what problem are you trying to solve?

amundson (Tue, 26 Mar 2019 20:01:23 GMT):
@kodonnel I'd probably agree if it didn't come with a huge downside of an additional dependency, making it completely undesirable in practice. The copies could contain a comment within indicating where they were copied from, if tracking it back is the problem. (Though, commit message may do this just as well.)

arsulegai (Tue, 26 Mar 2019 20:01:48 GMT):
I am thinking of a case where we may need to scale the application and handle requests from multiple clients, from different locations, each of them connecting to different validators.

arsulegai (Tue, 26 Mar 2019 20:03:52 GMT):
Let's consider cookiejar example, Initially: Cookies available - 100 (Following clients requests simultaneously) Person 1 Eat: 70 Person 2 Eat: 25 Person 3 Eat 25 Person 4 Bake 25

kodonnel (Tue, 26 Mar 2019 20:03:59 GMT):
@amundson The dependency is really there already, its just not explicit.

arsulegai (Tue, 26 Mar 2019 20:04:28 GMT):
Person 1 through 4 are all connected to different validators, through different clients, they are not aware of each other

arsulegai (Tue, 26 Mar 2019 20:05:21 GMT):
The transaction done by Person 3 purely depends on whether his/her request was considered before Person 4 or not.

amundson (Tue, 26 Mar 2019 20:10:18 GMT):
@kodonnel by dependency I meant within the build system

amundson (Tue, 26 Mar 2019 20:11:05 GMT):
@arsulegai is that the end of our example, or is there more?

amundson (Tue, 26 Mar 2019 20:11:05 GMT):
@arsulegai is that the end of your example, or is there more?

arsulegai (Tue, 26 Mar 2019 20:12:31 GMT):
It is end of my example, I am thinking of describing what would happen in this scenario case by case with each consensus engines we have

arsulegai (Tue, 26 Mar 2019 20:21:44 GMT):
The problem would be same, without ordering the batches considered. Validator needs a way to communicate to other validators, the list of considered batches.

jsmitchell (Tue, 26 Mar 2019 21:13:25 GMT):
@arsulegai if person 3's txn is based on person 4's, they should wait until they have high confidence that person 4's txn is committed (which will vary based on consensus algo and questions of connectivity), and then submit their txn listing person 4's txn as a dependency.

jsmitchell (Tue, 26 Mar 2019 21:14:37 GMT):
otherwise, the only thing the network guarantees is consistent ordering, state agreement, and validity/invalidity of transactions in that context

jsmitchell (Tue, 26 Mar 2019 21:14:51 GMT):
the universe does not owe anyone a valid transaction

rjones (Tue, 26 Mar 2019 21:43:18 GMT):
Has left the channel.

arsulegai (Tue, 26 Mar 2019 21:54:38 GMT):
Right, that's the problem I'm worrying about

arsulegai (Tue, 26 Mar 2019 21:56:01 GMT):
If Person 3's transaction is rejected because Person 4's transaction has not reached yet, then it should be ok. This is debatable as long as it is consistently rejected in all the nodes.

arsulegai (Tue, 26 Mar 2019 21:56:30 GMT):
However there shouldn't be a case that one node accepts it as valid and other node accepting as invalid.

arsulegai (Tue, 26 Mar 2019 21:59:04 GMT):
Coming to DoS, we've permissioning and there's way to control it. I believe we can address this inconsistency without worrying about it.

amundson (Tue, 26 Mar 2019 22:44:05 GMT):
@arsulegai if by "accepts" you mean it becoming part of the current chain, consensus is what handles that problem, in agreeing on the current head of the chain (which is a pointer to a block)

amundson (Tue, 26 Mar 2019 22:47:59 GMT):
I feel like you might be solving for problems in a non-BFT manner because you are currently using Raft, while the solutions currently in place (and some additional features that have been suggested) are focused on solving these things in a BFT-compatible way. Or, just as likely, I am misinterpreting how you would expect the "invalid transaction" information to be consumed by the other nodes.

amundson (Tue, 26 Mar 2019 22:58:39 GMT):
For removing invalid transactions, each node should run through and determine that they are invalid independently. I really think that it will be performant enough to do this periodically by running the pending queue through a scheduler on top of the current chain (independent of publishing cadence). I don't think you really gain much by sending hints from the leader on what is (potentially) invalid (and there are DoS issues with doing that).

arsulegai (Wed, 27 Mar 2019 03:26:51 GMT):
Ah! Let's take the above example with Raft as consensus algorithm. The leader node rejects transaction and leadership changes to another node after executing Person 4's transaction.

arsulegai (Wed, 27 Mar 2019 03:28:02 GMT):
Even with periodic cleaning of pending batches, we may not completely solve this inconsistency.

arsulegai (Wed, 27 Mar 2019 03:28:02 GMT):
Person 3's transaction will get accepted this time. Even with periodic cleaning of pending batches, we may not completely solve this inconsistency.

arsulegai (Wed, 27 Mar 2019 03:29:49 GMT):
A transaction which was rejected may be few moments ago is now accepted.

arsulegai (Wed, 27 Mar 2019 03:33:19 GMT):
This is not only the issue with Raft, it's possible even in PoET or PBFT. Let's say there's limit of number of batches considered in a block. Person 4's transaction is before Person 3's transaction in validator 1. But in validator 2 it's reverse. Block from validator 1 thinks Person 3's transaction is valid, but block from validator 2 thinks Person 3's transaction is invalid.

arsulegai (Wed, 27 Mar 2019 10:02:16 GMT):
If concern is about block processing/DoS, probably good time to bringing in state checkpoint feature?

duncanjw (Wed, 27 Mar 2019 13:15:52 GMT):
Has joined the channel.

arsulegai (Wed, 27 Mar 2019 17:37:18 GMT):
^ @amundson

arsulegai (Wed, 27 Mar 2019 17:37:18 GMT):
^ @amundson @jsmitchell

arsulegai (Wed, 27 Mar 2019 20:12:34 GMT):
How about one more exception introduced? 1. Validator rejecting transaction which is not retried. 2. TP rejecting transaction which is not retried. 3. Invalid error which is retried. Transactions rejected for case 1 are not added to the block. But transactions rejected for case 2 are added to the block.

arsulegai (Wed, 27 Mar 2019 20:13:38 GMT):
This will avoid DoS and help in letting other validators know list of transactions considered to build the block.

amundson (Wed, 27 Mar 2019 21:42:29 GMT):
Why don’t you like the simpler approach of just processing the pending queue periodically?

arsulegai (Thu, 28 Mar 2019 06:32:26 GMT):
Periodic cleanup will not solve the issue I'm talking about, we will not solve the issue of one validator rejecting a transaction and other validator accepting it later point in me.

arsulegai (Thu, 28 Mar 2019 07:10:11 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=RraQcEyrDQKZ89AQe) @jsmitchell Consistent ordering is not taken care of in case of invalid transactions!

arsulegai (Thu, 28 Mar 2019 07:10:47 GMT):
If there are clients from multiple locations, connected to different validators and sending requests we do not have ordering!

arsulegai (Thu, 28 Mar 2019 07:11:22 GMT):
But generally thinking for a Blockchain framework, this kind of distributed input scenario is common.

arsulegai (Thu, 28 Mar 2019 07:13:42 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=RhGtmfncdmXyhP8ku) @amundson No. The problem I discussed exists in all the consensus.

LeonardoCarvalho (Thu, 28 Mar 2019 10:17:36 GMT):
@arsulegai has a point, I did a study on a bet system, and that issue arose.

amundson (Thu, 28 Mar 2019 13:09:34 GMT):
I think it is completely unnecessary and irrelevant to order _invalid_ transactions.

amundson (Thu, 28 Mar 2019 13:11:50 GMT):
invalid transactions result in no state transitions within the system, and thus are completely irrelevant for reconstructing state. The _only_ purpose of transactions, consensus, and TPs in Sawtooth is the generate agreement on the list of transactions which modify state.

amundson (Thu, 28 Mar 2019 13:16:34 GMT):
TPs should also be stateless

jsmitchell (Thu, 28 Mar 2019 13:17:42 GMT):
furthermore, the only thing that makes a decision about transaction validity is the block publishing/validation process. The only place where a user would see a locally different answer (i.e. a transaction being considered valid by being included in a block and made the chain head) would be in a consensus which doesn't provide finality. Eventual consistent ordering at the network level is achieved at some level of confidence at a certain block depth. This is inherent in the design of the consensus algorithms and there is lots of literature on this. If you want to avoid this condition, use a consensus that provides finality guarantees and is well behaved wrt cliques.

amundson (Thu, 28 Mar 2019 13:19:17 GMT):
@arsulegai I'm trying to understand what is driving this desire. Do you have a TP that is stateful, and you are trying to run the exact sequence of transactions through it?

jsmitchell (Thu, 28 Mar 2019 13:20:44 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=vc7HjscNQmNMr8RvH) The system is designed to provide global ordering of valid transactions. If you care about the ordering of invalid transactions, recast them as valid transactions and track your application level validity within state.

jsmitchell (Thu, 28 Mar 2019 13:48:13 GMT):
It is important to understand that a 'invalid transaction: reason' answer from the validator you submitted a client transaction to is an answer 'local' to that validator and is only evaluated as part of block publishing. Before that happens, that transaction will already have been distributed to the network. At some point in the future, it may be considered valid and included in a block due to some other state change which makes it valid.

arsulegai (Fri, 29 Mar 2019 04:02:20 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=HSpbJNirTnyDmRHtx) @amundson Yeah! I mean even in the case of Cookiejar example I told you it is stateful application. Result of request depends on the number of cookies available currently.

arsulegai (Fri, 29 Mar 2019 04:04:59 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=dGPHMWfPdHA3BtYjM) @jsmitchell The merkle tree may grow to large size. How about considering TP rejections separate from validator's rejections? Propagate TP rejected transactions in the block and let all other validators try.

arsulegai (Fri, 29 Mar 2019 04:07:22 GMT):
@jsmitchell @amundson Looking through suggestions, I think you have understood the concern. There's a possibility as Mitchel mentioned to consider all invalid transactions as valid, store their status somewhere in global state. Wouldn't that be a workaround solution, that masks ordering issue of the block?

arsulegai (Fri, 29 Mar 2019 04:09:19 GMT):
And as I explained earlier, this issue occurs with all the consensus engines we have. I'm open for your comments and suggestions.

LeonardoCarvalho (Fri, 29 Mar 2019 10:26:19 GMT):
Maybe a 2 phase validation mechanism ?

amundson (Fri, 29 Mar 2019 14:09:45 GMT):
@arsulegai a stateful application is not the same as a stateful TP. (All sawtooth apps are stateful, that's kind of the point.) A stateful TP would be one that stores state outside of the validator and attempts to predict validator behavior to manage it rather than implementing a clean state transition function. That is explicitly not supported or desirable. The order of transactions given to the TPs for processing is also not guarenteed to be the same as contained within a block, in the case of the parallel scheduler. There is also no guarentee a transaction will be executed only one time. Some of this can be mitigated by selection of consensus (finality helps a bit), or addition of explicit dependencies, but such an app design will always be sensitive to changes in the validator.

arsulegai (Fri, 29 Mar 2019 19:28:05 GMT):
Ok, I'll take back the word ordering. I didn't understand the comment on stateful TP. The case I was talking about is where TP stores state in validator/global state.

arsulegai (Fri, 29 Mar 2019 19:33:59 GMT):
TP executing transaction at arbitrary point in time is fine. But how do you comment that there should not be multiple result for single input?

arsulegai (Fri, 29 Mar 2019 19:33:59 GMT):
TP executing transaction at arbitrary point in time is fine. But how do you comment for the question that there should not be multiple result for single input?

arsulegai (Fri, 29 Mar 2019 19:38:30 GMT):
I see this issue even in PBFT. There's no consensus engine currently available, that addresses this issue. Looking at the problem statement, to me it is validator issue and not related to consensus engines.

arsulegai (Sun, 31 Mar 2019 06:15:21 GMT):
@amundson ^

GiorgosT (Sun, 31 Mar 2019 14:57:45 GMT):
Has joined the channel.

amundson (Mon, 01 Apr 2019 01:40:34 GMT):
@arsulegai do you mean how to prevent the transaction from executing twice?

arsulegai (Mon, 01 Apr 2019 05:01:28 GMT):
Yes, prevent execution second time (which could end up having different result) without storing the status of transaction in global state. Storing status explicitly from TP seems to be workaround to me.

arsulegai (Mon, 01 Apr 2019 09:50:54 GMT):
One more thing to note why ordering within a block wouldn't be an issue, input and output addresses will refer to the same global state address. Hence there won't an issue with parallel scheduling you referred to.

jsmitchell (Mon, 01 Apr 2019 14:37:41 GMT):
why 'without storing the status of the transaction in global state'? That solves the problem you are describing at the application layer. That way, applications which are sensitive to this condition can make the choice to do this, while not enforcing it on every possible use case.

arsulegai (Mon, 01 Apr 2019 15:21:27 GMT):
Reason 1: The number of transactions may grow big. If history is required, it'll be too much to maintain at application layer. Reason 2: Almost all applications need this application layer workaround. Let it be cookiejar, let it be smallbank or intkey for that matter. 3. With state checkpoints, we can mitigate Blockchain growing big.

arsulegai (Mon, 01 Apr 2019 15:21:27 GMT):
Reason 1: The number of transactions may grow big. If history is required, it'll be too much to maintain at application layer. Reason 2: Almost all applications need this application layer workaround. Let it be cookiejar, let it be smallbank or intkey for that matter. Reason 3. With state checkpoints, we can mitigate Blockchain growing big.

jsmitchell (Mon, 01 Apr 2019 15:25:09 GMT):
you are talking about recordkeeping the 'invalid' transactions one way or another

jsmitchell (Mon, 01 Apr 2019 15:30:39 GMT):
also, state checkpointing is completely unrelated to this. That would allow transfer and more rapid catchup for newly joined nodes or nodes which have been disconnected for some time. The size of current state is still the size of current state, whether or not we support state checkpointing. If you remove addresses from state or overwrite prior versions of address in the new version of state, we already have support for purging old entries (past a block horizon).

arsulegai (Mon, 01 Apr 2019 16:04:48 GMT):
I understand, state checkpoint would help to speedy catch-up. But because of the reason you mentioned that state will grow large, it becomes workaround for application to handle. I'm worried that every application relying on global state would end up implementing this.

jsmitchell (Mon, 01 Apr 2019 16:15:59 GMT):
if you recordkeep the list of invalid transactions, they need to be stored somewhere, whether it's the blockstore or state. Growth of either of these is the same class of issue.

amundson (Mon, 01 Apr 2019 16:18:34 GMT):
requiring all the validators to process invalid transactions to validate a block is a complete non-starter

arsulegai (Mon, 01 Apr 2019 16:24:19 GMT):
A transaction is not invalid unless TP on that validator executes

arsulegai (Mon, 01 Apr 2019 16:24:43 GMT):
That's when I brought up proposal of splitting validator rejections vs TP rejections

amundson (Mon, 01 Apr 2019 16:31:21 GMT):
yes, I understand the distinction you are making, but structurally-invalid transactions/batches are already rejected very early on by the validator and don't make it as far as gossiping or the pending queue.

arsulegai (Mon, 01 Apr 2019 16:33:10 GMT):
They are executed when either the block is constructed or getting validated right?

arsulegai (Mon, 01 Apr 2019 16:35:57 GMT):
I agree with you that it's somewhat structural change. But it's worth exploring and beneficial.

amundson (Mon, 01 Apr 2019 18:14:56 GMT):
Moving a conversation from #sawtooth-pr-review ... @danintel has a PR up for seth - https://github.com/hyperledger/sawtooth-seth/pull/95/files - that uses curl to pull down the necessary apt keys. The advantage of this, if I understand correctly, is that because curl behaves properly with respect to proxy variables, we can reduce/avoid the proxy variable stuff we are currently maintaining.

amundson (Mon, 01 Apr 2019 18:17:36 GMT):
I think if we can avoid the huge blocks of proxy steps in Dockerfiles by taking this approach, it seems like the most appealing approach suggested thus far. Any additional thoughts?

rbuysse (Mon, 01 Apr 2019 18:36:59 GMT):
apparently that approach doesn't work for npm

rbuysse (Mon, 01 Apr 2019 20:15:47 GMT):
@Nonj We just published a 1.1.4.post1 version of the python sawtooth-sdk to fix the protobuf issue. Sorry you were impacted!

kumble (Tue, 02 Apr 2019 01:20:32 GMT):
Has joined the channel.

Nonj (Tue, 02 Apr 2019 01:50:35 GMT):
@rbuysse thanks a ton!

arsulegai (Tue, 02 Apr 2019 03:20:58 GMT):
@rbuysse what's the error with npm?

arsulegai (Tue, 02 Apr 2019 03:23:33 GMT):
@amundson Wanted a conclusion for earlier topic, do we consider a potential solution to solve it?

Mohit_Python (Tue, 02 Apr 2019 04:39:07 GMT):
Has joined the channel.

rbuysse (Tue, 02 Apr 2019 18:10:21 GMT):
@arsulegai I'm not sure what the issue is. @danintel was looking at it

Dan (Tue, 02 Apr 2019 21:19:42 GMT):
What do we want out of Ursa? Here's my starter list in order of tractability more than priority: 1. painless update (to edwards curve) for faster signature operations 2. aggregated signatures for use with PBFT c.f. https://github.com/hyperledger/sawtooth-rfcs/pull/30 3. Confidential transactions - by which I mean that the contents or some field of the contents is somehow hidden (encrypted, committed to, etc.), but can still be meaningfully verified in TP logic (e.g. range proofs). 4. Private transactions - by which I mean concealing the identity of the participants. (I don't know that there is any mechanism for this.) There are plans for anonymous credentials, e.g. proving you possess some attribute like are a member of a bank or carry a certain credit line. I don't know where that capability factors into priorities for the rest of you.

amundson (Tue, 02 Apr 2019 21:25:32 GMT):
@Dan anonymous credentials are interesting for Grid. would like to see a hello world.

amundson (Tue, 02 Apr 2019 21:30:44 GMT):
For signing, I think the highest priority has to be support for all the languages sawtooth uses. Without that, we can't achieve your #1, because we can't adopt Ursa without major caveats.

amundson (Tue, 02 Apr 2019 21:32:41 GMT):
How would aggregated signatures enhance the PBFT approach in that RFC? That would mostly seem to be a problem of network communication, not a signing library problem.

amundson (Tue, 02 Apr 2019 21:36:08 GMT):
#3 and #4 on your list probably require architectural discussion for Sawtooth before we would know what we need from Ursa?

eugene-babichenko (Wed, 03 Apr 2019 08:45:28 GMT):
Hello, I have a question regarding a possible bug in Sawtooth validator. We emailed that several times to the team, but haven't received any response on that, so I guess I might receive it faster here. Now we are trying to do memory profiling of the validator to see what is actually going on when the issue occurs. Hoping for your cooperation on that issue. Below is the text of our report. I am also curious if it might be related to this issue https://jira.hyperledger.org/browse/STL-1505. ================================================================================ Steps to reproduce: Spin up a network of Sawtooth nodes with the settings provided below. Emulate any network connectivity problem on one of the nodes. This can be done by simply disabling the network connection or by bringing other nodes down. After that a node will start attempts to reconnect to its peers. While attempting to reconnect a node starts to consume more and more memory over time. In the last case the memory loss was about 5 MB per hour. After a successful reconnect memory is not being freed. Another issue with such configuration is that a node will keep only the number of peers provided in `minimum_peer_connectivity` and is having trouble finding more peers, but this may be a networking or our old development setup issue, we will investigate it further. The validator configuration: ``` peering = "dynamic" minimum_peer_connectivity = 3 scheduler = "parallel" ``` We also provide at least 3 seeds. Our environment We run the validator in a Docker container based on Ubuntu Xenial with the valdator installed from the Bumper repository (version 1.1.4). The Python version is 3.5.2 Our assumptions We assume that this issue is related to the the process of reconnection attempts (as it is creates a lot of OutboundConnection instances) but we haven’t found the exact point where the memory leaks yet. Probably some of the objects are not collected by the Python GC or memory is leaking from the unsafe Rust part of the application.

amundson (Wed, 03 Apr 2019 14:09:22 GMT):
@eugene-babichenko I don't think it is necessarily the same as STL-1505, because that was filed due to apparent memory leaks occurring over a longer period of time on an otherwise stable network.

jsmitchell (Wed, 03 Apr 2019 14:14:44 GMT):
I doubt this is rust code. All the connection attempts are handled in python, currently.

Kirill_Vusik (Wed, 03 Apr 2019 15:23:04 GMT):
Has joined the channel.

eugene-babichenko (Wed, 03 Apr 2019 15:30:18 GMT):
Thank you for the responses. Running vprof on that code showed me that after 4 hours of work on an unstable network there is around 6000 instances of `sawtooth_validator.networking.future.Future`. Also there are objects that are probably related to that, like `TimerContext`, `deque` and `threading.Condition` (also 5k-6k instances for each). I guess that those futures are still referenced from somewhere or are ignored by the GC for some reason.

eugene-babichenko (Wed, 03 Apr 2019 15:30:18 GMT):
Thank you for the responses. Running vprof on that code showed me that after 4 hours of work on an unstable network there is around 6000 instances of `sawtooth_validator.networking.future.Future`. Also there are objects that are probably related to that, like `TimerContext`, `deque` and `threading.Condition` (also 5k-6k instances for each). I guess that those futures are still referenced from somewhere or are ignored by the GC for some reason. Now I'm taking a deeper look into that.

jsmitchell (Wed, 03 Apr 2019 16:10:24 GMT):
they are almost certainly the futures associated with the connection attempts and are never being resolved due to the lack of a response from the other side. They probably need timeouts added so that they can be removed.

Dan (Wed, 03 Apr 2019 19:43:53 GMT):
@rberg2 I think you just fixed this in another repo right? I don't recall which one tho and I'd like to cross check the syntax. https://github.com/hyperledger/sawtooth-poet/pull/26

rberg2 (Wed, 03 Apr 2019 19:46:11 GMT):
yes! I have added that back to a few repos

rberg2 (Wed, 03 Apr 2019 19:47:53 GMT):
here is one https://github.com/hyperledger/sawtooth-sabre/commit/cfe050bb0d542ec4911cc40b2ce1d86bb336873a

Nonj (Fri, 05 Apr 2019 21:55:06 GMT):
Hey all! I'm currently trying to update to sawtooth 1.1.4 for the NEXT hyperdirectory project. After updating the sawtooth-validator image from 1.0.5 to 1.1.4, it seems as though I need to fix some configurations as well. Is there an upgrade checklist somewhere that I can check out to see what else I need to modify? Thank you in advance!

danintel (Fri, 05 Apr 2019 23:03:26 GMT):
@Nonj See #sawtooth channel for an answer. This channel is for Sawtooth core development.

manojgop (Sat, 06 Apr 2019 10:24:49 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=TpwZ4TRCosjDGeKWi) @amundson @amundson @jsmitchell What's your suggestion for a client using sawtooth block chain to solve the issue where multiple clients gets different status of transaction (as valid or invalid) based on which validator endpoint they are sending the transaction (via rest api). Since in certain scenarios (as @arsulegai mentioned above in cookie jar example) clients cannot depend on other clients for transaction ordering. This is independent of the consensus mechanism we use. Do you think we can support this by making changes to sawtooth validator or is it the responsibility of clients should handle this ? How are other block chain frameworks handling this scenario ?

manojgop (Sat, 06 Apr 2019 10:24:49 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=TpwZ4TRCosjDGeKWi) @amundson @jsmitchell What's your suggestion for a client using sawtooth block chain to solve the issue where multiple clients gets different status of transaction (as valid or invalid) based on which validator endpoint they are sending the transaction (via rest api). Since in certain scenarios (as @arsulegai mentioned above in cookie jar example) clients cannot depend on other clients for transaction ordering. This is independent of the consensus mechanism we use. Do you think we can support this by making changes to sawtooth validator or is it the responsibility of clients should handle this ? How are other block chain frameworks handling this scenario ?

manojgop (Sat, 06 Apr 2019 10:24:49 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=TpwZ4TRCosjDGeKWi) @amundson @jsmitchell What's your suggestion for a client using sawtooth to solve the issue where multiple clients gets different status of transaction (as valid or invalid) based on which validator endpoint clients are sending the transaction (via rest api). Since in certain scenarios (as @arsulegai mentioned above in cookie jar example) clients cannot depend on other clients for transaction ordering. This is independent of the consensus mechanism we use. Do you think we can support this by making changes to sawtooth validator or is it the responsibility of clients should handle this ? How are other block chain frameworks handling this scenario ?

manojgop (Sat, 06 Apr 2019 10:24:49 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=TpwZ4TRCosjDGeKWi) @amundson @jsmitchell What's your suggestion for a client using sawtooth to solve the issue where multiple clients gets different status of transaction (as valid or invalid) based on which validator endpoint clients are sending the transaction (via rest api). Since in certain scenarios (as @arsulegai mentioned above in cookie jar example) clients cannot depend on other clients for transaction ordering. This is independent of the consensus mechanism we use. Do you think we can support this by making changes to sawtooth validator or is it the responsibility of client to handle this ? How are other block chain frameworks handling this scenario ?

manojgop (Sat, 06 Apr 2019 10:24:49 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=TpwZ4TRCosjDGeKWi) @amundson @jsmitchell What's your suggestion for a client using sawtooth to solve the issue where multiple clients gets different status of transaction (as valid or invalid) based on which validator endpoint clients are sending the transaction (via rest api). Since in certain scenarios (as @arsulegai mentioned above in cookie jar example) clients cannot depend on other clients for transaction ordering. Since order of execution of transactions in a block is not fixed in sawtooth different validator can get different execution status for the transaction (from TP). This is independent of the consensus mechanism we use. Do you think we can support this by making changes to sawtooth validator or is it the responsibility of client to handle this ? How are other block chain frameworks handling this scenario ?

manojgop (Sat, 06 Apr 2019 10:24:49 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=TpwZ4TRCosjDGeKWi) @amundson @jsmitchell What's your suggestion for a client using sawtooth to solve the issue where multiple clients gets different status of transaction (as valid or invalid) based on which validator endpoint clients are sending the transaction (via rest api). Since in certain scenarios (as @arsulegai mentioned above in cookie jar example) clients cannot depend on other clients for transaction ordering. Since order of execution of transactions in a block is not fixed in sawtooth different validator can get different execution status for the transaction (from TP). This is independent of the consensus mechanism we use. 1) Do you think we can support this by making changes to sawtooth validator or is it the responsibility of client to handle this ? 2) How are other block chain frameworks handling this scenario ?

manojgop (Sat, 06 Apr 2019 10:24:49 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=TpwZ4TRCosjDGeKWi) @amundson @jsmitchell What's your suggestion for a client using sawtooth to solve the issue where multiple clients gets different status of transaction (as valid or invalid) based on which validator endpoint clients are sending the transaction (via rest api). Since in certain scenarios (as @arsulegai mentioned above in cookie jar example) clients cannot depend on other clients for transaction ordering. Since order of execution of transactions in a block is not fixed in sawtooth different validator can get different execution status for the transaction (from TP). This is independent of the consensus mechanism we use. ``` ``` 1) Do you think we can support this by making changes to sawtooth validator or is it the responsibility of client to handle this ? 2) How are other block chain frameworks handling this scenario ?

manojgop (Sat, 06 Apr 2019 10:24:49 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=TpwZ4TRCosjDGeKWi) @amundson @jsmitchell What's your suggestion for a client using sawtooth to solve the issue where multiple clients gets different status of transaction (as valid or invalid) based on which validator endpoint clients are sending the transaction (via rest api). Since in certain scenarios (as @arsulegai mentioned above in cookie jar example) clients cannot depend on other clients for transaction ordering. Since order of execution of transactions in a block is not fixed in sawtooth different validator can get different execution status for the transaction (from TP). This is independent of the consensus mechanism we use. 1) Do you think we can support this by making changes to sawtooth validator or is it the responsibility of client to handle this ? 2) How are other block chain frameworks handling this scenario ?

manojgop (Sat, 06 Apr 2019 10:24:49 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=TpwZ4TRCosjDGeKWi) @amundson @jsmitchell What's your suggestion for a client using sawtooth to solve the issue where multiple clients gets different status of transaction (as valid or invalid) based on which validator endpoint clients are sending the transaction (via rest api). Since in certain scenarios (as @arsulegai mentioned above in cookie jar example) clients cannot depend on other clients for transaction ordering. Since order of execution of transactions in a block is not fixed in sawtooth, different validator can get different execution status for the transaction (from TP). This is independent of the consensus mechanism we use. 1) Do you think we can support this by making changes to sawtooth validator or is it the responsibility of client to handle this ? For example, use serial ordering for execution of transactions in a block that is published by nodes. 2) How are other block chain frameworks handling this scenario ?

manojgop (Sat, 06 Apr 2019 10:24:49 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=TpwZ4TRCosjDGeKWi) @amundson @jsmitchell What's your suggestion for a client using sawtooth to solve the issue where multiple clients gets different status of transaction (as valid or invalid) based on which validator endpoint clients are sending the transaction (via rest api). Since in certain scenarios (as @arsulegai mentioned above in cookie jar example https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=tLdPedmukKFTNh7do) clients cannot depend on other clients for transaction ordering. Since order of execution of transactions in a block is not fixed in sawtooth, different validator can get different execution status for the transaction (from TP). This is independent of the consensus mechanism we use. 1) Do you think we can support this by making changes to sawtooth validator or is it the responsibility of client to handle this ? For example, use serial ordering for execution of transactions in a block that is published by nodes. 2) How are other block chain frameworks handling this scenario ?

manojgop (Sat, 06 Apr 2019 10:24:49 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=TpwZ4TRCosjDGeKWi) @amundson @jsmitchell What's your suggestion for a client using sawtooth to solve the issue where multiple clients gets different status of transaction (as valid or invalid) based on which validator endpoint clients are sending the transaction (via rest api). Since in certain scenarios (as @arsulegai mentioned above in cookie jar example https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=tLdPedmukKFTNh7do) clients cannot depend on other clients for transaction ordering. Since order of execution of transactions in a block is not fixed in sawtooth, different validator can get different execution status for the transaction (from TP). This is independent of the consensus mechanism we use. 1) Do you think we can support this by making changes to sawtooth validator or is it the responsibility of client to handle this ? For example, use serial ordering for execution of transactions in a block that is published by nodes. or let client application handle it by sending all requests from other clients to a batcher which can order the transactions in a batch before sending it to validator ? 2) How are other block chain frameworks handling this scenario ?

amundson (Sat, 06 Apr 2019 13:01:16 GMT):
@manojgop The expectation is that the submitting client will query the validator that it used to submit the batch. The idea that multiple clients are coordinating to know the batch id to request status across the network seems contrived, because the coordination implies sideband communication. If you have sideband communication, you also have a way to talk about validity sideband.

amundson (Sat, 06 Apr 2019 13:14:29 GMT):
There is a lot more complexity in batch status than the topics we have discussed here, and at least with a forking consensus like poet there are edge cases. Some of those exist with any consensus, such as a validator restart potentially causing the loss of information about a batch because it was either invalid or in the pending queue. That makes batch status unknowable without complete network knowledge (and if you have complete access to all validators on the network, it doesn’t sound like a blockchain use case)

amundson (Sat, 06 Apr 2019 13:19:10 GMT):
One idea to help develop or experiment with some more enhancements would be to create the ability to subscribe to a stream of transaction/batch state transitions. An app could persist this information to help solve the restart issue. So, a sample “event” on that stream might be (batch-id-a, old-status, new-status).

manojgop (Sat, 06 Apr 2019 15:27:01 GMT):
@amundson Are you suggesting to handle this issue at application level ?. I was thinking even if client uses event subscription mechanism, some clients may get valid status for a txn/batch where as other might get invalid status based on order of execution of transactions. This happens because each client is sending transactions independently of other clients and if there is no batcher client present to order these transactions and if each validator executes transactions in any arbitrary order (parallel execution) then block chain framework can't guarantee the status of transaction. So from client perspective the logic becomes complex.

manojgop (Sat, 06 Apr 2019 15:28:41 GMT):
Do you see this as a generic problem with any block chain framework which is distributed/decentralized. Hence this has be taken care by application ? Or can we do some changes in validator to handle transaction execution and status in a more deterministic way

manojgop (Sat, 06 Apr 2019 15:28:41 GMT):
Do you see this as a generic problem with any block chain framework which is distributed/decentralized. Hence this has be handled at application layer ? Or can we do some changes in validator to handle transaction execution and status in a more deterministic way

jsmitchell (Sat, 06 Apr 2019 15:54:11 GMT):
I have already suggested a solution to this that you can use today. Use a consensus that supports finality like PBFT and treat all your transactions as valid at the blockchain level. If they are “invalid” at the app level, record their result differently in state.

jsmitchell (Sat, 06 Apr 2019 16:08:21 GMT):
Transactions are only processed when included in a block. By declaring all transactions as “valid”, you ensure they will always be included in the block they are first considered in. The use of a finality-supporting consensus ensures that there is no disagreement about chain head (i.e. no forks which might have a different ordering)

arsulegai (Sat, 06 Apr 2019 17:46:37 GMT):
You're right @jsmitchell a class of probable statements are handled in finality based consensus engines like PBFT / Raft. Still the issue we're discussing persist in PBFT. Concern is not about ordering transactions within a block, rather transactions considered while constructing the blocks.

arsulegai (Sat, 06 Apr 2019 17:46:37 GMT):
You're right @jsmitchell a class of problem statements are handled in finality based consensus engines like PBFT / Raft. Still the issue we're discussing here persist in PBFT. Concern is not about ordering transactions within a block, rather transactions considered while constructing the blocks.

arsulegai (Sat, 06 Apr 2019 17:49:04 GMT):
A transaction is decided as 'valid'/'invalid' by particular validator, in case of finality based consensus engine all other validators are also supposed to give same result as the validator which is proposing the block.

arsulegai (Sat, 06 Apr 2019 17:52:49 GMT):
Storing status of transaction in global state instead of rejecting is a potential solution to the problem. I'm not denying that and it works perfect.

arsulegai (Sat, 06 Apr 2019 17:55:15 GMT):
Question however is, this is a problem in all applications. Do we go with such a strong need to the Sawtooth application developers? Or if possible can we consider a solution from framework itself?

arsulegai (Sat, 06 Apr 2019 18:05:34 GMT):
@amundson ^

arsulegai (Sat, 06 Apr 2019 18:24:58 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=Q6uVf9yWvJNSSfA9Uv) @ These problems can be considered in a separate discussion. With a client that has retry ability, validator restart scenarios could be mitigated. However if pending batches are not cleared over long period, they can be re-broadcasted to other validators. If other validators have these transactions, they'll ignore duplicate entry.

jsmitchell (Sat, 06 Apr 2019 19:39:34 GMT):
You are understanding half of my suggestion @arsulegai

jsmitchell (Sat, 06 Apr 2019 19:46:08 GMT):
I have offered a solution that involves considering transactions for publication once and only once. It requires both finality-supporting consensus and making a single decision on inclusion of transactions.

jsmitchell (Sat, 06 Apr 2019 19:52:17 GMT):
Sawtooth is going to continue to support forking consensus, so it is not going to be able to provide the guarantees you are talking about to the client. Even if we communicated information about invalid transactions considered along with a block, that block itself could be on a fork that is abandoned in the future. End result is a changing answer to the client.

manojgop (Sun, 07 Apr 2019 03:38:00 GMT):
@jsmitchell So are you suggesting this solution as a temporary or permanent approach that can be adopted by all clients (i.e. to treat all transactions as valid at block chain level and If they are “invalid” at the app level, record their result differently in state). Do you mean making any generic changes at sawtooth core level to give deterministic behavior for transaction status to clients is not feasible since sawtooth has to support both forking and non-forking/finality based consensus ? I'm trying to understand is this issue something specific to sawtooth design or is it applicable for all block chain frameworks which is decentralized and distributed

manojgop (Sun, 07 Apr 2019 03:38:00 GMT):
@jsmitchell So are you suggesting this solution as a temporary or permanent approach that can be adopted by all clients (i.e. to treat all transactions as valid at block chain level and If they are “invalid” at the app level, record their result differently in state). Do you mean making any generic changes at sawtooth core level to give deterministic behavior for transaction status to clients is not feasible since sawtooth has to support both forking and non-forking/finality based consensus ? I'm trying to understand is this issue something specific to sawtooth design and/or is it applicable for all block chain frameworks (which is decentralized and distributed) and which uses non-forking/finality consensus like PBTF.

manojgop (Sun, 07 Apr 2019 03:38:00 GMT):
@jsmitchell So are you suggesting this solution as a temporary or permanent approach that can be adopted by all clients (i.e. to treat all transactions as valid at block chain level and If they are “invalid” at the app level, record their result differently in state). Do you mean making any generic changes at sawtooth core level to give deterministic behavior for transaction status to clients is not feasible since sawtooth has to support both forking and non-forking/finality based consensus ? I'm trying to understand is this issue something specific to sawtooth design and/or is it applicable for all block chain frameworks (which is decentralized and distributed) and which uses non-forking/finality consensus like PBFT.

arsulegai (Sun, 07 Apr 2019 06:07:26 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=GWLyv2qMGbv83ySc34) @jsmitchell Thanks it works, almost all applications will require such solution.

arsulegai (Sun, 07 Apr 2019 06:07:30 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=GWLyv2qMGbv83ySc34) @jsmitchell Thanks it works, almost all applications will require such solution.

arsulegai (Sun, 07 Apr 2019 06:08:48 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=vdttBmpoMVFooYJifW) @jsmitchell Forking consensus engines will have this issue anyway. How about making it possible for non forking engines?

pankajcheema (Sun, 07 Apr 2019 08:21:37 GMT):
Only you can see this message

pankajcheema (Sun, 07 Apr 2019 08:22:13 GMT):
I am getting this message at the time of posting in sawtooth channel any suggestion please

arsulegai (Sun, 07 Apr 2019 10:29:04 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=tpiPdp6EvJk5HM85t) @pankajcheema I've replied to your question in #sawtooth channel

manojgop (Tue, 09 Apr 2019 04:25:51 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=a4f4c4b9-1557-479b-989a-6fbe332c0571) @arsulegai @arsulegai @jsmitchell I think the method proposed works only if we use serial scheduling. In case of parallel scheduling, different Validator/TP can update the state differently. A transaction can be marked as valid in the state by one validator/TP and another validator/TP can mark it as invalid based on the order in which transaction is executed in the batch. In this cookie jar example, the clients are not adding any transaction dependency. So if we expect validator to give a deterministic result , we need to use serial scheduling.

manojgop (Tue, 09 Apr 2019 04:25:51 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=a4f4c4b9-1557-479b-989a-6fbe332c0571) @arsulegai @jsmitchell I think the method proposed works only if we use serial scheduling. In case of parallel scheduling, different Validator/TP can update the state differently. A transaction can be marked as valid in the state by one validator/TP and another validator/TP can mark it as invalid based on the order in which transaction is executed in the batch. In this cookie jar example, the clients are not adding any transaction dependency. So if we expect validator to give a deterministic result , we need to use serial scheduling.

manojgop (Tue, 09 Apr 2019 04:25:51 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=a4f4c4b9-1557-479b-989a-6fbe332c0571) @arsulegai @jsmitchell I think the method proposed works only if we use serial scheduling. In case of parallel scheduling, different Validator/TP can update the state differently. A transaction can be marked as valid in the state by one validator/TP and another validator/TP can mark it as invalid based on the order in which transaction is executed in the batch. In cookie jar example, the clients are not adding any transaction dependency when they submit transactions/batch. So if we expect all validators/TP to give a deterministic result , we need to use serial scheduling.

jsmitchell (Tue, 09 Apr 2019 11:16:14 GMT):
Not true. The parallel scheduler preserves context ordering based on overlaps of inputs/outputs. Contexts provided are identical, and state transition is therefore 100% deterministic. Nothing can flip from being valid to invalid or vise versa.

arsulegai (Tue, 09 Apr 2019 13:34:45 GMT):
Hmm, then we can think of solution for the scenario, for non forking consensus engines.

amundson (Tue, 09 Apr 2019 15:59:58 GMT):
@arsulegai @manojgop given the amount of discussion here and confusion (in particular, on how things currently work and what you can/can't do w/BFT and forking consensus), I think we will need RFCs to move forward. @ltseeley @jsmitchell and myself will probably submit a couple of RFCs short-term related to these discussions. That should help us capture a lot of this discussion and iterate on it until we agree on how to proceed.

arsulegai (Tue, 09 Apr 2019 16:17:09 GMT):
Awesome! Thanks

arsulegai (Wed, 10 Apr 2019 06:53:16 GMT):
Topic: Allowing batch with empty transaction in a block.

arsulegai (Wed, 10 Apr 2019 06:53:53 GMT):
I've question here, do we need to allow batches with empty transaction list be put in a block? If so, what's the use case?

arsulegai (Wed, 10 Apr 2019 06:53:53 GMT):
I've a question here, do we need to allow batches with empty transaction list be put in a block? If so, what's the use case?

manojgop (Wed, 10 Apr 2019 08:33:18 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=ZAXzuHLXFA3JPs2Xg) @amundson Wonderful !!. That will help

Dan (Wed, 10 Apr 2019 17:45:54 GMT):
My takeaway from the invalid transaction discussion was that 1) it is satisfiable with application design using the sawtooth as-is. 2) it may be an easier app-developer experience if we added features. Features such as TP can return "Rejected Transaction" and the transaction will be handled by sawtooth like a valid transaction in terms of block creation. That would satisfy the risk around an invalid transaction being reconsidered at some point in the future when it is no longer desireable. It may also satisfy the client application need for a response but I understand that problem less. Adding another response type would create other issues that I won't ramble about here, but I do think it's worth considering how easy our app developer experience is.

arsulegai (Wed, 10 Apr 2019 18:10:18 GMT):
@jsmitchell ^ I've another interesting question, it's regarding block with batch, batch having no transactions.

amundson (Wed, 10 Apr 2019 18:59:50 GMT):
@arsulegai we probably shouldn't allow empty transaction lists in batches

Dan (Wed, 10 Apr 2019 19:14:54 GMT):
I think the basis for that question is keeping blocks flowing when there's no Txns for e.g. poet2? Might be a way to use something like blockinfo to generate an automatic transaction. Or issue a transaction from consensus. As far as an empty transaction, I recall discussion a long time ago about having transactions that were successful but did not write anything back to state. I don't recall if there is anything which enables/prevents that.

amundson (Wed, 10 Apr 2019 19:18:29 GMT):
or just allow empty blocks

amundson (Wed, 10 Apr 2019 19:19:03 GMT):
the question though was about batches with empty transactions list

arsulegai (Thu, 11 Apr 2019 02:38:32 GMT):
Right @Dan that would have been my next topic/question, if there's a way to create empty block to keep the flow in PoET2? Current question was to understand the validator's behavior and see what it has to offer. We observed a case where blocks are added with batches, but they had no transactions in them.

Dan (Thu, 11 Apr 2019 15:26:32 GMT):
what does the consensus rfc say about initiating a block? Does it place any requirements on their being batches/transactions to be processed?

jsmitchell (Thu, 11 Apr 2019 15:38:51 GMT):
it shouldn't say anything about that

jsmitchell (Thu, 11 Apr 2019 15:40:18 GMT):
in my opinion, an empty block should be valid, as should the possibility of creating a block which causes injection (e.g. blockinfo) where the only batches/txns in the block would be injected contents.

jsmitchell (Thu, 11 Apr 2019 15:40:32 GMT):
empty batches are a bug

jsmitchell (Thu, 11 Apr 2019 16:00:30 GMT):
@arsulegai that JIRA issue you just commented on - Adam is no longer contributing to the project, so feel free to take ownership of the bug

arsulegai (Thu, 11 Apr 2019 16:18:46 GMT):
@Dan consensus rfc says if finalize_block is successful a new block sent out

arsulegai (Thu, 11 Apr 2019 16:19:31 GMT):
@jsmitchell I guess current behavior makes validator consider it as error scenario and empty block is not created. Did I miss something or is there a way to do it?

arsulegai (Thu, 11 Apr 2019 16:21:15 GMT):
Wait! I didn't check what if there's BatchInfo injector batches in it

jsmitchell (Thu, 11 Apr 2019 16:22:04 GMT):
block publishing is driven by consensus based on the presence of work in the pending queue. If there are no valid items in the pending queue a block won't be produced. This is a different question than "are empty blocks valid?". When I said "in my opinion" above, I was expressing how I think the system should work, not describing how it does work. Again, in my opinion, consensus should be in complete control of the decision to publish a block, even if there aren't batches in the pending queue.

arsulegai (Thu, 11 Apr 2019 16:34:14 GMT):
I agree with you. A possible missing thing to make it look complete could be ability to dynamically decide what batches are to be added into the block. Something like making consensus engine be able to add batches when summarizing or making batch injection as a separate component.

arsulegai (Thu, 11 Apr 2019 16:58:26 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=mkdbGsWWgxQDXC3ns) @jsmitchell @ltseeley a quick question on this: what is maximum_peer_connectivity introduced for? Is it tracking total number of peering connections or is it tracking total number of peers itself? Example: if P1, P2, P3 are the only validators in the network then do we say maximum_peer_connectivity can be set to 2?

pschwarz (Fri, 12 Apr 2019 18:20:46 GMT):
If you have outstanding PR on sawtooth-core, you will need to rebase, to pick up some linting-related fixes that will be unrelated to your code.

ltseeley (Fri, 12 Apr 2019 18:30:40 GMT):
@arsulegai I can't say for sure what the _intended_ purpose of the setting is, but what it _does_ is track the number of connections.

arsulegai (Fri, 12 Apr 2019 19:40:51 GMT):
I'm thinking to make it maximum number of peers supported

arsulegai (Fri, 12 Apr 2019 19:41:10 GMT):
As it sounds

arsulegai (Fri, 12 Apr 2019 19:41:29 GMT):
Would that be ok?

Dan (Fri, 12 Apr 2019 21:00:00 GMT):
You mean you are trying to document the meaning? Or you want to make a change to its functionality?

rjones (Sat, 13 Apr 2019 00:07:40 GMT):
Has joined the channel.

rjones (Sat, 13 Apr 2019 00:07:49 GMT):
Could I ask Sawtooth maintainers to please join https://lists.hyperledger.org/g/maintainers ? I'd appreciate it.

arsulegai (Sat, 13 Apr 2019 04:42:12 GMT):
@Dan The variable is either not fully documented or doesn't follow meaning. I'm thinking to make a code change. Is it allowed? Many got confused to know that it's not just total number of peers.

jimbarritt (Sat, 13 Apr 2019 13:13:41 GMT):
Has joined the channel.

Dan (Sat, 13 Apr 2019 20:33:39 GMT):
I think it is implemented correctly, but might not be clearly explained. The idea is to cap the number of a connections that your server will accept. If you imagine your server gets widely advertised across a 100 node network, your server could get overwhelmed. This parameter caps the connections so that it will never connect to more than n other nodes. The comment in the gossip.py file compliments the sysadmin guide, but maybe this explanation makes the definition more clear. gossip.py: ``` maximum_peer_connectivity (int): The validator will reject new peer requests if the number of connected peers reaches this threshold. ``` validator_configuration_file.rst: ```The maximum number of peers that will be accepted.``` Implicitly, but perhaps not clearly, that means the max number of peers that will be accepted _by this validator_.

amundson (Mon, 15 Apr 2019 14:16:11 GMT):
We are working on a HL project proposal for a new transation execution library (tentatively called Transact). Sawtooth and Grid will use this library. Anyone interested, please review - https://docs.google.com/document/d/13d0cMReGOhK13BbdgMOFZy_prUzqWBXWc4nlI7mehpY/edit

arsulegai (Tue, 16 Apr 2019 03:35:10 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=REnDyaXe92iF5NbF7) @Dan Confusion comes because there can be multiple connections with same peer. This variable considers them distinct, for example there are cases where a validator V1 connects to validator V2 3 times, but ignores connection request from V3.

arsulegai (Tue, 16 Apr 2019 03:35:57 GMT):
Should we consider multiple connections to same peer as one and allow other peers to join this validator was my question.

jsmitchell (Tue, 16 Apr 2019 04:02:07 GMT):
Multiple connections between peers in either direction should be considered a bug

arsulegai (Tue, 16 Apr 2019 04:12:41 GMT):
Thanks for the clarification @jsmitchell @Dan

sah (Tue, 16 Apr 2019 04:24:51 GMT):
Has joined the channel.

Dan (Tue, 16 Apr 2019 12:44:40 GMT):
How do these peering variables work with raft and pbft which expect fully connected networks? Do they just get overridden or does the admin need to know to set them for a fully connected network?

arsulegai (Tue, 16 Apr 2019 13:27:31 GMT):
Admin needs to explicitly set peers such that exactly one peering request is established, and all peers are connected. Example: if validator V1 starts first, validator V2 shall specify V1 as peer, validator V3 will have V1 & V2 as peers. Multiple peering issue is seen if there's peers list marking other way round along with these. i.e. if V1 also lists V2 or V3 as peer or if V2 also lists V3 as peer.

arsulegai (Tue, 16 Apr 2019 13:27:31 GMT):
Admin needs to explicitly set peers such that exactly one peering request is established, and all peers are connected to each other. Example: if validator V1 starts first, validator V2 shall specify V1 as peer, validator V3 will have V1 & V2 as peers. Multiple peering issue is seen if there's peers list marking other way round along with these. i.e. if V1 also lists V2 or V3 as peer or if V2 also lists V3 as peer.

arsulegai (Tue, 16 Apr 2019 13:27:31 GMT):
Admin needs to explicitly set peers such that exactly one peering connection is established between two nodes, and all peers are connected to each other. Example: if validator V1 starts first, validator V2 shall specify V1 as peer, validator V3 will have V1 & V2 as peers. Multiple peering issue is seen if there's peers list marking other way round along with these. i.e. if V1 also lists V2 or V3 as peer or if V2 also lists V3 as peer.

arsulegai (Tue, 16 Apr 2019 13:30:31 GMT):
But this above still doesn't explain why sometimes 3 peering connections are seen between 2 validators.

amundson (Tue, 16 Apr 2019 14:08:46 GMT):
I agree with @jsmitchell that multiple connections between peers is not the intended design

amundson (Tue, 16 Apr 2019 14:10:47 GMT):
I think this was "hidden" when we were only using PoET because we used dynamic peering more often than static.

manojgop (Wed, 17 Apr 2019 17:58:06 GMT):
@amundson That's right. If we take example of Raft with static peering for fully connected network, if we follow peering approach as per the Raft documentation then we don't get multiple connections. For example, to startup a 3 node network where the nodes have endpoints alpha, beta, and gamma, you would do the following: Startup the alpha validator with no peers specified (but --peering static should still be used) Startup the beta validator with --peers tcp://alpha:8800 Startup the gamma validator with --peers tcp://alpha:8800,tcp://beta:8800

manojgop (Wed, 17 Apr 2019 18:00:33 GMT):
But if we do peering as alpha --peers beta, gamma; beta --peers alpha, gamma and gamma --peers alpha, beta, then we get multiple peer connections

manojgop (Wed, 17 Apr 2019 18:00:33 GMT):
But if we do peering as alpha --peers beta, gamma; beta --peers alpha, gamma and gamma --peers alpha, beta, then we get multiple peer connections between validators

manojgop (Wed, 17 Apr 2019 18:04:21 GMT):
So is the suggestion to fix the code so only single connection is made between validators irrespective of how static peering is configured which gives more flexibility in making peering configuration (for example in docker compose file) or to document it more clearly and leave the code as it is.

Dan (Wed, 17 Apr 2019 18:11:45 GMT):
@manojgop duplicate peers should be a bug. Alpha should only have at most one connection with Beta.

amundson (Wed, 17 Apr 2019 19:03:52 GMT):
maybe not technically a bug since it's working as intended, but certainly we should fix it with an enhancement

amundson (Wed, 17 Apr 2019 19:05:24 GMT):
I'm probably overloading 'intended' there

amundson (Wed, 17 Apr 2019 19:06:46 GMT):
at an architectural level, it was the intent to have only one; what was developed was slightly different, but its not buggy per-se, it's just missing a feature to avoid the duplication.

jsmitchell (Wed, 17 Apr 2019 19:07:32 GMT):
i haven't looked at that code in a long time. I seem to recall it made an attempt to avoid duplicates, so maybe there is a race condition there with the handshake or something else. Probably a bug.

jsmitchell (Wed, 17 Apr 2019 19:08:23 GMT):
PRs to make it work properly per the described intent would be welcomed

Dan (Wed, 17 Apr 2019 20:08:54 GMT):
_it's not a bug. it just needs more features to make it work right._

jsmitchell (Wed, 17 Apr 2019 20:09:50 GMT):
I just looked at the code. There is some light checking at the network 'connection' level which only allows "inbound" connections identified by the connection id if there's not already an "outbound" connection with that identifier.

amundson (Wed, 17 Apr 2019 21:38:59 GMT):
@jsmitchell where is that?

jsmitchell (Wed, 17 Apr 2019 21:46:17 GMT):
https://github.com/hyperledger/sawtooth-core/blob/master/validator/sawtooth_validator/networking/handlers.py#L148

jsmitchell (Wed, 17 Apr 2019 21:46:42 GMT):
in the ConnectHandler

jsmitchell (Wed, 17 Apr 2019 21:48:00 GMT):
I suspect that either the connection id naming is not consistent inbound/outbound, or there are timing issues. In any case, it's not robust.

amundson (Wed, 17 Apr 2019 23:56:06 GMT):
I don't think that's what that code is trying to do. #notabugjustneedafeature

amundson (Wed, 17 Apr 2019 23:57:07 GMT):
Wouldn't you have to do this at a level where you know the peer id?

amundson (Wed, 17 Apr 2019 23:57:24 GMT):
which is in the "gossip" code

amundson (Thu, 18 Apr 2019 00:00:51 GMT):
@Dan I'm not saying this happened, but I could see someone in the past having argued that static peering should do what the user tells it to do and not get too clever.

Kirill_Vusik (Thu, 18 Apr 2019 09:23:44 GMT):
Hello everyone, I would like to start contributing in Sawtooth source (and in Sawtooth Core in particular). I noticed that there is a *help-wanted* label in Jira, but the last task with that label was created last October. Is this label relevant, may start working an unassigned without this label?

Kirill_Vusik (Thu, 18 Apr 2019 09:23:44 GMT):
Hello everyone, I would like to start contributing in Sawtooth source (and in Sawtooth Core in particular). I noticed that there is a *help-wanted* label in Jira, but the last task with that label was created last October. Is this label relevant? May start working on an unassigned task without this label?

Kirill_Vusik (Thu, 18 Apr 2019 09:23:44 GMT):
Hello everyone, I would like to start contributing in Sawtooth source (and in Sawtooth Core in particular). I noticed that there is a *help-wanted* label in Jira, but the last task with that label was created last October. Is this label relevant? May I start working on an unassigned task without this label?

LeonardoCarvalho (Thu, 18 Apr 2019 12:32:13 GMT):
hi all, if the guys at DAML are interested in a contractor, please let me now... ;)

Dan (Thu, 18 Apr 2019 12:49:27 GMT):
@Kirill_Vusik great to have your involvement! It is possible some of those issues are getting too old. Also very likely they just haven't been fixed yet because they are lower priority. We should scrub them soon. You are welcome to work on any Jira item though. Probably just a good idea to note in Jira that you are working on it and/or signal people here. That way we won't have two people solving the same problem.

Kirill_Vusik (Thu, 18 Apr 2019 13:26:09 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=ZWNaw8ByTEAPMchSK) @Dan Got it, thank you

Dan (Thu, 18 Apr 2019 13:46:41 GMT):
Let's say at some point I want to start a new chain by seeding it with the state of a different chain. Or I want to just excise off history for storage reasons. What would you call flattening state history? I don't think that's checkpointing. And it's probably more than 'pruning'.

jsmitchell (Thu, 18 Apr 2019 15:03:44 GMT):
definitely in the same neighborhood as checkpointing. the decision to 'start a new chain' makes a substantial difference in approach.

jsmitchell (Thu, 18 Apr 2019 15:04:57 GMT):
a crazy idea might be asserting the existence of a given state root hash at genesis

jsmitchell (Thu, 18 Apr 2019 15:05:41 GMT):
for this 'start a new chain' concept

jsmitchell (Thu, 18 Apr 2019 15:06:05 GMT):
a more conventional way to think about it would be a set of transactions that preload the trie

jsmitchell (Thu, 18 Apr 2019 15:06:10 GMT):
but that might be massive

Dan (Thu, 18 Apr 2019 15:22:32 GMT):
so like exporting state into a set of 'raw' transactions like `set(addy:123abc, value:b'fdc321')` for a genesis batch as opposed to literally replaying the whole blockchain. I'm also thinking about catastrophic recovery thing that's not thought out enough to articulate. Thought I'd look around literature but couldn't think of a correct term for this history culling checkpointing.

jsmitchell (Thu, 18 Apr 2019 15:25:05 GMT):
checkpointing for a running chain is probably a lot more dynamic than you are thinking

jsmitchell (Thu, 18 Apr 2019 15:25:41 GMT):
but shares some properties with the 'crazy idea' above

jsmitchell (Thu, 18 Apr 2019 15:26:49 GMT):
capabilities including the transfer and hash validation of portions/all of state and using that as a base context for the validation of the next block based on that state

amundson (Thu, 18 Apr 2019 17:04:47 GMT):
@Kirill_Vusik we will have more entered into JIRA soon; in the meantime, there are quite a few things that could be worked on related to the rust transition. if you want something relatively easy, working on rewriting the CLI in rust would be good. if you want something a bit harder, rewriting the REST API in rust using actix would be great (it would allow us to optionally compile it into the validator itself or the grid daemon). help with the transact library would be great (for example, we need a parallel scheduler and additional execution adapters (https://github.com/bitwiseio/transact (warning - bleeding edge)); if you want a super deep/hard problem, we need to prototype compiling PBFT consensus directly into the validator (by deep, I mean rewrite a chunk of the validator); if you want a research-level topic, there are features that could be put into PBFT. There is also endless things around the core, like enhancing the SDKs. For example, the Rust SDK's threading model needs to be enhanced.

Dan (Thu, 18 Apr 2019 18:47:34 GMT):
working to move RFC #32 along. I have commented on the open question about endpoints. I don't want to fracture the dialog into chat, but drawing attention to the update so you can comment there: https://github.com/hyperledger/sawtooth-rfcs/pull/32

Kirill_Vusik (Thu, 18 Apr 2019 18:51:28 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=BHsQpYjp7yTpKgXmf) @amundson thank you for the clarification, I will think it over

danintel (Thu, 18 Apr 2019 20:30:59 GMT):
I remember someone commenting on why Rust was favored over Go fort Sawtooth. Both are similar--compiled languages with strict checking, extensive libraries, and works well in a concurrent environment. So why Rust > Go?

danintel (Thu, 18 Apr 2019 20:30:59 GMT):
I remember someone commenting on why Rust was favored over Go for Sawtooth. Both are similar--compiled languages with strict checking, extensive libraries, and works well in a concurrent environment. So why Rust > Go?

jsmitchell (Thu, 18 Apr 2019 20:39:08 GMT):
rust's design provides extensive guarantees regarding memory safety, which are enforced at compile time. Rust's memory management is extremely predictable based on ownership rules and lifetimes, to the extent that the compiler can insert explicit allocation/deallocation instructions. Go uses a garbage collector and provides no such guarantees.

jsmitchell (Thu, 18 Apr 2019 20:39:53 GMT):
the go toolchain/build process is also not great

jsmitchell (Thu, 18 Apr 2019 20:40:34 GMT):
downsides of rust are that it can be challenging to learn and that it takes hours to compile hello world

Dan (Thu, 18 Apr 2019 21:02:19 GMT):
lol

amundson (Thu, 18 Apr 2019 21:04:27 GMT):
someone joked at one point that Go is where programmers go when they can't handle Rust, and that's why all the Rust crates are of better quality

amundson (Thu, 18 Apr 2019 21:04:45 GMT):
I don't think Rust is actually that hard though

amundson (Thu, 18 Apr 2019 21:05:32 GMT):
You just have to learn some new patterns that you should have probably been using before

jsmitchell (Thu, 18 Apr 2019 21:06:07 GMT):
the compiler, and required syntax, and warnings have also improved substantially over the last several years

jsmitchell (Thu, 18 Apr 2019 21:06:19 GMT):
so, it's a lot easier than it used to be

amundson (Thu, 18 Apr 2019 21:06:36 GMT):
yes, that's why

jsmitchell (Thu, 18 Apr 2019 21:06:55 GMT):
basically it's "My First Rust" now

jsmitchell (Thu, 18 Apr 2019 21:06:58 GMT):
for babies

jsmitchell (Thu, 18 Apr 2019 21:13:10 GMT):
now the compiler does stuff like this:

jsmitchell (Thu, 18 Apr 2019 21:13:13 GMT):

Clipboard - April 18, 2019 4:13 PM

jsmitchell (Thu, 18 Apr 2019 21:14:41 GMT):
which is pretty damn cool, because it's the implication of a parse error

danintel (Thu, 18 Apr 2019 23:47:22 GMT):
(parenthesis are your friend)

eugene-babichenko (Fri, 19 Apr 2019 14:39:22 GMT):
Hello everyone! I am trying to implement the `block_end` batch injector (which was not implemented in 1.1.4 from what I can see). It works fine for some time, but at a random point of time validator stops to accept batches and they end up in the unknown status. What I am trying to is to call the `block_end` method from the `CandidateBlock::summarize` and add batches to the scheduler. So, before this line (https://github.com/hyperledger/sawtooth-core/blob/v1.1.4/validator/src/journal/candidate_block.rs#L346) I do the following: ```rust let batches_by_block_end = self.poll_injectors(|injector: &cpython::PyObject| { // ...here comes a long boilerplate copied from add_batch } for b in batches_by_block_end { let batch_id = b.header_signature.clone(); self.pending_batches.push(b.clone()); self.pending_batch_ids.insert(batch_id.clone()); self.scheduler.add_batch(b, None, true).unwrap(); } ``` Before the validator stops accepting new batches it also outputs the following log entry: `[2019-04-19 14:29:17.236 DEBUG scheduler_parallel] Removed 8 incomplete batches from the schedule` Do you have any suggestions about that?

jsmitchell (Fri, 19 Apr 2019 16:27:32 GMT):
what consensus are you using @eugene-babichenko ? how many nodes?

eugene-babichenko (Fri, 19 Apr 2019 16:37:16 GMT):
@jsmitchell We are using our own engine and this is a single node setup.

eugene-babichenko (Fri, 19 Apr 2019 16:37:45 GMT):
It can be reproduced on a larger network though

jsmitchell (Fri, 19 Apr 2019 16:38:29 GMT):
ok, so I believe that message means that the consensus engine instructed the block publisher to publish a block. When this happens, the publisher finalizes the schedule which aborts any uncompleted transactions in flight (thus, 'removed 8 incomplete batches from the schedule').

jsmitchell (Fri, 19 Apr 2019 16:39:25 GMT):
if the result is a schedule with 0 valid batches, i believe the publisher will abandon the block, which ultimately is probably the wrong thing to do

jsmitchell (Fri, 19 Apr 2019 16:39:51 GMT):
you will need to think about this in the context of injecting a batch at the end

eugene-babichenko (Fri, 19 Apr 2019 16:40:06 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=awzuaReBGLfjvAQPM) @jsmitchell Exactly. What confuses me is that all batches I post after that go to the UNKNOWN state.

jsmitchell (Fri, 19 Apr 2019 16:40:39 GMT):
what probably needs to happen is that the schedule needs to be finalized, and then added to with that injected transaction, which will require some changes

jsmitchell (Fri, 19 Apr 2019 16:41:36 GMT):
I don't know why this should affect submitting new batches

jsmitchell (Fri, 19 Apr 2019 16:41:59 GMT):
i would think that it would continue to try to publish blocks based on the presence of batches in the pending queue, per the rules of consensus

eugene-babichenko (Fri, 19 Apr 2019 16:42:30 GMT):
That might related to our own modules (API, consensus) somehow. I will retry that with the devmode on a local setup to see how it goes.

jsmitchell (Fri, 19 Apr 2019 16:42:38 GMT):
good idea

jsmitchell (Fri, 19 Apr 2019 16:43:19 GMT):
you might want to set the inter block time for devmode so it has a chance to include some transactions, but you will need to handle that case of zero valid transactions from the pending queue for injection

eugene-babichenko (Fri, 19 Apr 2019 16:44:41 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=a9jYwKfkA9JEKpY8a) @jsmitchell So basically what I need to do is: 1. Finalize the schedule 2. Add injected batches to the end of the schedule via some new method 3. Wait for execution results Did I get the idea?

jsmitchell (Fri, 19 Apr 2019 16:46:17 GMT):
yes I think so. That is a material change to how it behaves currently, so it may be worth a short RFC to pad out the injection feature, if you are willing. At the very least, worth a discussion with others here @amundson @pschwarz @agunde etc

eugene-babichenko (Fri, 19 Apr 2019 16:47:28 GMT):
I think that I will be able to prepare an RFC a week later or so. Now I just need to make it work on my local fork :)

jsmitchell (Fri, 19 Apr 2019 16:48:18 GMT):
other related feature areas are giving the consensus engine appropriate complete control over block publishing (i.e. don't make an 'empty' block a special case), and coming up with sdk/protocol support for defining and loading injector code

eugene-babichenko (Fri, 19 Apr 2019 16:49:32 GMT):
I would be happy to work on some of those once I have enough time. Especially on batch injectors

jsmitchell (Fri, 19 Apr 2019 16:49:48 GMT):
very cool :)

LeonardoCarvalho (Fri, 19 Apr 2019 19:09:19 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=oHbXgzCNhytLmAD2R) @Dan I've read references to that as "sharding", and I think it would be a good addition tp the tool set. An endpoint can receive some data (hashes, signatures, etc) to validate the state passed as valid for it's genesis, with a "remote trusting" that ought to be revalidated at some points in time...

Dan (Fri, 19 Apr 2019 19:10:18 GMT):
i've normally seen sharding to mean partitioning the database so some nodes handle some keys and other nodes handle other keys.

LeonardoCarvalho (Fri, 19 Apr 2019 19:12:08 GMT):
and that's the case, if you can abstract enough. :)

maksimb (Mon, 22 Apr 2019 12:39:46 GMT):
Has joined the channel.

eugene-babichenko (Wed, 24 Apr 2019 16:29:56 GMT):
Hello. Experienced another panic in the validator (1.1): ` thread 'ChainThread:CommitReceiver' panicked at 'No method cancel on python scheduler: PyErr { ptype: , pvalue: Some(KeyError('Value was not found',)), ptraceback: Some() }'` Unfortunately, I cannot provide the exact conditions to reproduce that, but maybe someone here can suggest why it happens?

amundson (Wed, 24 Apr 2019 16:32:22 GMT):
@pschwarz have you seen stuff like this before across the rust/python boundary? ^

artem.frantsiian (Wed, 24 Apr 2019 16:33:06 GMT):
Has joined the channel.

pschwarz (Wed, 24 Apr 2019 16:33:53 GMT):
I have not seen that

pschwarz (Wed, 24 Apr 2019 16:34:25 GMT):
I wonder if it similar to some of that weird stack corruption we were seeing in the 100% python days

pschwarz (Wed, 24 Apr 2019 16:34:54 GMT):
s/similar/related

eugene-babichenko (Wed, 24 Apr 2019 16:35:28 GMT):
What was the issue with stack corruption?

pschwarz (Wed, 24 Apr 2019 16:38:14 GMT):
There would be times when the the python interpreter would crash complaining that methods were missing from objects that clearly should be there

pschwarz (Wed, 24 Apr 2019 16:38:27 GMT):
I think there might be an old Jira ticket for that :thinking:

jsmitchell (Wed, 24 Apr 2019 16:39:01 GMT):
i think @Dan closed that ticket recently

pschwarz (Wed, 24 Apr 2019 16:40:21 GMT):
Oh

pschwarz (Wed, 24 Apr 2019 16:40:27 GMT):
That explains why I can't find it

eugene-babichenko (Wed, 24 Apr 2019 16:41:21 GMT):
Was it fixed anywhere? I guess I can just incorporate the fix to my fork (I am still on 1.1.4)

pschwarz (Wed, 24 Apr 2019 16:42:40 GMT):
I don't think so - it certainly was a motivating factor in the rust rewrite, though

pschwarz (Wed, 24 Apr 2019 16:43:01 GMT):
But I'd have to look through the ticket notes, if I can find it

jsmitchell (Wed, 24 Apr 2019 16:44:57 GMT):
STL-1019

jsmitchell (Wed, 24 Apr 2019 16:45:20 GMT):
https://jira.hyperledger.org/browse/STL-1019

artem.frantsiian (Wed, 24 Apr 2019 16:56:25 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=Bi6amdC6NLwvF5nYH) @eugene-babichenko What can you suggest us to solve this problem, we will run a public testnet on sawtooth now and want to make sure that the network will fully operational?

artem.frantsiian (Wed, 24 Apr 2019 16:56:25 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=Bi6amdC6NLwvF5nYH) @jsmitchell @pschwarz What can you suggest us to solve this problem, we will run a public testnet on sawtooth now and want to make sure that the network will fully operational?

pschwarz (Wed, 24 Apr 2019 18:47:02 GMT):
restarting the node is effective

Dan (Wed, 24 Apr 2019 19:05:42 GMT):
yeah that wouldn't take down the whole network. it would be a single node that could be restarted.

Dan (Wed, 24 Apr 2019 19:06:14 GMT):
@pschwarz feel free to reopen that if you think it's relevant. I saw no activity incl. reproducers on it and assumed it was dead.

pschwarz (Wed, 24 Apr 2019 19:11:42 GMT):
"Steps to reproduce" were certainly an issue with that ticket.

artem.frantsiian (Wed, 24 Apr 2019 20:49:02 GMT):
we'll try to provide actual steps to reproduce

DavidAEdwards (Thu, 25 Apr 2019 13:10:00 GMT):
Has joined the channel.

ruffsl (Fri, 26 Apr 2019 21:33:46 GMT):
Has joined the channel.

ruffsl (Fri, 26 Apr 2019 21:34:18 GMT):
daml

paul.sitoh (Sun, 28 Apr 2019 19:41:46 GMT):
Folks, I was wondering, if I were to replace the REST API with a client to talk directly to the validator, what the messaging zmq model is? Is it PUB/SUB or something else?

arsulegai (Mon, 29 Apr 2019 02:59:57 GMT):
@paul.sitoh You can use ROUTER/DEALER in zmq

paul.sitoh (Mon, 29 Apr 2019 07:20:08 GMT):
Ok thanks

paul.sitoh (Mon, 29 Apr 2019 07:20:26 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=9M5yrkLgZCSQkPyHB) @arsulegai Thanks

paul.sitoh (Tue, 30 Apr 2019 08:35:58 GMT):
@arsulegai or others when we route a message to a validator, is there any particular socket `address` we should push to?

paul.sitoh (Tue, 30 Apr 2019 08:35:58 GMT):
@arsulegai or others when we route a message, via the ROUNTER/DEALER model, to a validator, is there any particular socket `address` we should push to?

paul.sitoh (Tue, 30 Apr 2019 08:55:53 GMT):
For example ```socket.send(address, ZMQ.SNDMORE); socket.send("This is the payload.".getBytes(), NOFLAGS);```

paul.sitoh (Tue, 30 Apr 2019 08:55:53 GMT):
For example ```socket.send(address, ZMQ.SNDMORE);socket.send("This is the payload.".getBytes(), NOFLAGS);```

artem.frantsiian (Thu, 02 May 2019 12:02:02 GMT):
[11:49:20.397 node-1-testnet.remme.io [Thread-18] handlers WARNING] Connecting peer provided an invalid endpoint: tcp://:8800; Ignoring connection request. [11:49:20.398 node-1-testnet.remme.io [Thread-18] dispatch WARNING] Sending hang-up in reply to NETWORK_CONNECT to connection f2e7abf91a8701602233c8f63169e2e2116e5fbb49bc391e660d4699a5eb1ad52dc7ce2c79246886f71447746b068d4277c4a26b1497f8efa5e88efca31c97e3 [11:49:23.412 node-1-testnet.remme.io [Thread-8] handlers WARNING] Connecting peer provided an invalid endpoint: tcp://:8800; Ignoring connection request. [11:49:23.412 node-1-testnet.remme.io [Thread-8] dispatch WARNING] Sending hang-up in reply to NETWORK_CONNECT to connection c0e26df110b2a22effea302d29151d4256efc8e28f79c80bf25e627f9f7c8acd8ffb7fb076a24f2ad563760288a923d84b49f1688a6c5bdf208f337c4a38fd8f [11:49:26.428 node-1-testnet.remme.io [Thread-23] handlers WARNING] Connecting peer provided an invalid endpoint: tcp://:8800; Ignoring connection request. [11:49:26.429 node-1-testnet.remme.io [Thread-23] dispatch WARNING] Sending hang-up in reply to NETWORK_CONNECT to connection fa69c2f2a5622076d5ec631dd62f489884cca22c088345d0f821779d77d110241a003525805883e0bcd5cfa250ae886b337fd6b22b7ac5cd31905f65d1aac33c [11:49:29.442 node-1-testnet.remme.io [Thread-21] handlers WARNING] Connecting peer provided an invalid endpoint: tcp://:8800; Ignoring connection request. [11:49:29.442 node-1-testnet.remme.io [Thread-21] dispatch WARNING] Sending hang-up in reply to NETWORK_CONNECT to connection 4505fe3a7b50dc218d51a50ca6dec8a24a7439f99c7a1d84a90b7c084985af29978fcf14f2909434dd197fc440725645457a619f1e43a39db558596e760da95e [11:49:32.459 node-1-testnet.remme.io [Thread-23] handlers WARNING] Connecting peer provided an invalid endpoint: tcp://:8800; Ignoring connection request. [11:49:32.459 node-1-testnet.remme.io [Thread-23] dispatch WARNING] Sending hang-up in reply to NETWORK_CONNECT to connection 228d3418ecfbd9ca2e094218c67b0ba74925ab081b810b30f685942e80fab034ae215d1cf7bde2892f9fe33835440d5d3977f96b7e960e986ca6fd721e9c200a

artem.frantsiian (Thu, 02 May 2019 12:02:02 GMT):
Hello, why i can get this in my logs? [11:49:20.397 node-1-testnet.remme.io [Thread-18] handlers WARNING] Connecting peer provided an invalid endpoint: tcp://:8800; Ignoring connection request. [11:49:20.398 node-1-testnet.remme.io [Thread-18] dispatch WARNING] Sending hang-up in reply to NETWORK_CONNECT to connection f2e7abf91a8701602233c8f63169e2e2116e5fbb49bc391e660d4699a5eb1ad52dc7ce2c79246886f71447746b068d4277c4a26b1497f8efa5e88efca31c97e3 [11:49:23.412 node-1-testnet.remme.io [Thread-8] handlers WARNING] Connecting peer provided an invalid endpoint: tcp://:8800; Ignoring connection request. [11:49:23.412 node-1-testnet.remme.io [Thread-8] dispatch WARNING] Sending hang-up in reply to NETWORK_CONNECT to connection c0e26df110b2a22effea302d29151d4256efc8e28f79c80bf25e627f9f7c8acd8ffb7fb076a24f2ad563760288a923d84b49f1688a6c5bdf208f337c4a38fd8f [11:49:26.428 node-1-testnet.remme.io [Thread-23] handlers WARNING] Connecting peer provided an invalid endpoint: tcp://:8800; Ignoring connection request. [11:49:26.429 node-1-testnet.remme.io [Thread-23] dispatch WARNING] Sending hang-up in reply to NETWORK_CONNECT to connection fa69c2f2a5622076d5ec631dd62f489884cca22c088345d0f821779d77d110241a003525805883e0bcd5cfa250ae886b337fd6b22b7ac5cd31905f65d1aac33c [11:49:29.442 node-1-testnet.remme.io [Thread-21] handlers WARNING] Connecting peer provided an invalid endpoint: tcp://:8800; Ignoring connection request. [11:49:29.442 node-1-testnet.remme.io [Thread-21] dispatch WARNING] Sending hang-up in reply to NETWORK_CONNECT to connection 4505fe3a7b50dc218d51a50ca6dec8a24a7439f99c7a1d84a90b7c084985af29978fcf14f2909434dd197fc440725645457a619f1e43a39db558596e760da95e [11:49:32.459 node-1-testnet.remme.io [Thread-23] handlers WARNING] Connecting peer provided an invalid endpoint: tcp://:8800; Ignoring connection request. [11:49:32.459 node-1-testnet.remme.io [Thread-23] dispatch WARNING] Sending hang-up in reply to NETWORK_CONNECT to connection 228d3418ecfbd9ca2e094218c67b0ba74925ab081b810b30f685942e80fab034ae215d1cf7bde2892f9fe33835440d5d3977f96b7e960e986ca6fd721e9c200a

artem.frantsiian (Thu, 02 May 2019 12:02:02 GMT):
Hello, why I can get this in my logs? [11:49:20.397 node-1-testnet.remme.io [Thread-18] handlers WARNING] Connecting peer provided an invalid endpoint: tcp://:8800; Ignoring connection request. [11:49:20.398 node-1-testnet.remme.io [Thread-18] dispatch WARNING] Sending hang-up in reply to NETWORK_CONNECT to connection f2e7abf91a8701602233c8f63169e2e2116e5fbb49bc391e660d4699a5eb1ad52dc7ce2c79246886f71447746b068d4277c4a26b1497f8efa5e88efca31c97e3 [11:49:23.412 node-1-testnet.remme.io [Thread-8] handlers WARNING] Connecting peer provided an invalid endpoint: tcp://:8800; Ignoring connection request. [11:49:23.412 node-1-testnet.remme.io [Thread-8] dispatch WARNING] Sending hang-up in reply to NETWORK_CONNECT to connection c0e26df110b2a22effea302d29151d4256efc8e28f79c80bf25e627f9f7c8acd8ffb7fb076a24f2ad563760288a923d84b49f1688a6c5bdf208f337c4a38fd8f [11:49:26.428 node-1-testnet.remme.io [Thread-23] handlers WARNING] Connecting peer provided an invalid endpoint: tcp://:8800; Ignoring connection request. [11:49:26.429 node-1-testnet.remme.io [Thread-23] dispatch WARNING] Sending hang-up in reply to NETWORK_CONNECT to connection fa69c2f2a5622076d5ec631dd62f489884cca22c088345d0f821779d77d110241a003525805883e0bcd5cfa250ae886b337fd6b22b7ac5cd31905f65d1aac33c [11:49:29.442 node-1-testnet.remme.io [Thread-21] handlers WARNING] Connecting peer provided an invalid endpoint: tcp://:8800; Ignoring connection request. [11:49:29.442 node-1-testnet.remme.io [Thread-21] dispatch WARNING] Sending hang-up in reply to NETWORK_CONNECT to connection 4505fe3a7b50dc218d51a50ca6dec8a24a7439f99c7a1d84a90b7c084985af29978fcf14f2909434dd197fc440725645457a619f1e43a39db558596e760da95e [11:49:32.459 node-1-testnet.remme.io [Thread-23] handlers WARNING] Connecting peer provided an invalid endpoint: tcp://:8800; Ignoring connection request. [11:49:32.459 node-1-testnet.remme.io [Thread-23] dispatch WARNING] Sending hang-up in reply to NETWORK_CONNECT to connection 228d3418ecfbd9ca2e094218c67b0ba74925ab081b810b30f685942e80fab034ae215d1cf7bde2892f9fe33835440d5d3977f96b7e960e986ca6fd721e9c200a

DavidAEdwards (Thu, 02 May 2019 12:03:32 GMT):
You should probably ask that in #sawtooth, this channel is reserved for core dev discussions. Less so for troubleshooting.

artem.frantsiian (Thu, 02 May 2019 12:04:29 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=ebCzqCbstHt6qapo5) @DavidAEdwards thanks

artem.frantsiian (Thu, 02 May 2019 14:54:54 GMT):
Could somebody help? We get next logs from validator container: https://gist.github.com/ArtemFrantsiian/97e79c8bf0c13dab7e259250809cbf61

jsmitchell (Thu, 02 May 2019 14:58:49 GMT):
@artem.frantsiian looks like you have modified the validator/build process?

eugene-babichenko (Thu, 02 May 2019 15:01:47 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=HNrbfGZastf9icLPQ) @jsmitchell The validator was modified a bit. The build process remains unchanged.

jsmitchell (Thu, 02 May 2019 15:04:52 GMT):
I don't think anyone's going to be able to interpret that backtrace for you. If such a thing were happening to me, I'd make sure to build with debug symbols and use a version of gdb that's capable of decoding python stack frames

jsmitchell (Thu, 02 May 2019 15:06:23 GMT):
the set_merkle_root prior to the ffi boundary is probably a clue, but I don't want to speculate

bobonana (Thu, 02 May 2019 18:47:08 GMT):
https://stackoverflow.com/questions/54372292/sawtooth-1-1-2-getting-rejected-due-to-invalid-predecessor-when-adding-block-t Over in sawtooth-next-directory we're getting the same behavior, a `rejected due to invalid predecessor` error when attempting to add a new block to the genesis block. The cause seems non-deterministic, we can follow the same deployment steps on the same server after removing all images, containers, volumes, etc. and restarting the docker daemon between builds. it's not happening often but also prevents us from adding new blocks when it does happen. This all started happening after our upgrade from sawtooth v1.0 to v1.1

bobonana (Thu, 02 May 2019 18:47:08 GMT):
https://stackoverflow.com/questions/54372292/sawtooth-1-1-2-getting-rejected-due-to-invalid-predecessor-when-adding-block-t Over in sawtooth-next-directory we're getting the same behavior, a `rejected due to invalid predecessor` error when attempting to add a new block to the genesis block. The cause seems non-deterministic, we can follow the same deployment steps on the same server after removing all images, containers, volumes, etc. and restarting the docker daemon between builds. it's not happening often but also prevents us from adding new blocks when it does happen. This all started happening after our upgrade from sawtooth v1.0 to v1.1 is this expected behavior and is there a preferred way to recover from it, or prevent it entirely?

bobonana (Thu, 02 May 2019 18:47:08 GMT):
https://stackoverflow.com/questions/54372292/sawtooth-1-1-2-getting-rejected-due-to-invalid-predecessor-when-adding-block-t Over in sawtooth-next-directory we're getting the same behavior, a `rejected due to invalid predecessor` error when attempting to add a new block to the genesis block. The cause seems non-deterministic, we can follow the same deployment steps on the same server after removing all images, containers, volumes, etc. and restarting the docker daemon between builds. It's repeating, just not consistently. it's not happening often but also prevents us from adding new blocks when it does happen. This all started happening after our upgrade from sawtooth v1.0 to v1.1 is this expected behavior and is there a preferred way to recover from it, or prevent it entirely?

bobonana (Thu, 02 May 2019 18:47:08 GMT):
https://stackoverflow.com/questions/54372292/sawtooth-1-1-2-getting-rejected-due-to-invalid-predecessor-when-adding-block-t Over in sawtooth-next-directory we're getting the same behavior, a `rejected due to invalid predecessor` error when attempting to add a new block to the genesis block. The cause seems non-deterministic, we can follow the same deployment steps on the same server after removing all images, containers, volumes, etc. and restarting the docker daemon between builds. It's repeating, just not consistently. it's not happening often but also prevents us from adding new blocks when it does happen. This all started happening after our upgrade from sawtooth v1.0 to v1.1 is this expected behavior and is there a preferred way to recover from it, or prevent it entirely? we're running in developer mode with the devmode-engine image

bobonana (Thu, 02 May 2019 18:52:43 GMT):
On a separate note, we ran some performance tests and noticed that 500 transactions took around 1.3 minutes to be processed using v1.0, but with the switch to v1.1 those same 500 transactions now take around 9.3 minutes to be processed. Is this performance hit expected with the update?

danintel (Thu, 02 May 2019 21:35:29 GMT):
@bobonana I don't know of a performance hit. Witch what consensus? (if you are using DevMode, don't use it to measure performance.

danintel (Thu, 02 May 2019 21:35:29 GMT):
@bobonana I don't know of a performance hit. Witch what consensus? (if you are using DevMode, don't use it to measure performance.)

bobonana (Thu, 02 May 2019 21:54:55 GMT):
@danintel the performance hit is with DevMode, it's not a big deal then since it won't affect prod, but out of curiosity do you know the cause of the performance hit in DevMode (and that other consensus algos weren't similarly affected)?

danintel (Thu, 02 May 2019 22:01:30 GMT):
@bobonana No I do not. Good question for #sawtooth-consensus-dev. I know that DevMode tries to publish blocks as fast as possible and is more fragile than the other "real" consensus engines.

bobonana (Thu, 02 May 2019 22:12:28 GMT):
@danintel ah, that makes sense. I'll check in that room too. thanks! on a side note. is there a preferred method of setting a docker healthcheck for the sawtooth, settings-tp, and validator images? Is it possible that attempting to add to the chain before either component is ready would cause the error?

danintel (Thu, 02 May 2019 22:16:43 GMT):
Submitted transactions just sit there in the validator until the TP is up and running.

bobonana (Thu, 02 May 2019 22:28:33 GMT):
@danintel really? our TP seems to be functioning fine, the validator just gives a `rejected due to invalid predecessor` and stops processing transactions alltogether after. when it does happen it only seems to happen during the addition of a block to the genesis block (this is when we bootstrap our local admin user which results in a message being submitted to the validator, and happens right after everything initializes). waiting doesn't seem to help, we've had to restart the application up until now (tho I'll try restarting just the TP image next time and see if that works)

artem.frantsiian (Fri, 03 May 2019 11:11:02 GMT):
hello, I tried to get settings of my network by sawtooth cli and got next error: ```root@node-genesis-testnet-dev:/# sawtooth settings list --url http://localhost:8008 Traceback (most recent call last): File "/usr/lib/python3/dist-packages/sawtooth_cli/main.py", line 174, in main_wrapper main() File "/usr/lib/python3/dist-packages/sawtooth_cli/main.py", line 162, in main do_settings(args) File "/usr/lib/python3/dist-packages/sawtooth_cli/settings.py", line 86, in do_settings _do_settings_list(args) File "/usr/lib/python3/dist-packages/sawtooth_cli/settings.py", line 110, in _do_settings_list setting.ParseFromString(decoded) google.protobuf.message.DecodeError: Error parsing message root@node-genesis-testnet-dev:/#```

amundson (Fri, 03 May 2019 14:53:06 GMT):
@bobonana not sure, but possible that is being caused by something non-deterministic in your transaction processor

amundson (Fri, 03 May 2019 14:54:30 GMT):
could possibly explain the performance issue too, if the TP is causing random published blocks to be marked invalid

amundson (Fri, 03 May 2019 14:55:43 GMT):
@artem.frantsiian can you report that in JIRA so we can track it?

artem.frantsiian (Fri, 03 May 2019 15:16:01 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=A9gqHDn2g778rsAps) @amundson yes, I'll report you soon

bobonana (Fri, 03 May 2019 22:11:26 GMT):
@amundson we're using the default validator image, how could I check if there's somethign non-deterministic?

wchang (Sat, 04 May 2019 05:49:29 GMT):
Hi all...was trying to play around with sawtooth on a 32bit machine and I had to update lmdb.rs as follows to get it to compile... -const DEFAULT_SIZE: usize = 1 << 40; // 1024 ** 4 +//const DEFAULT_SIZE: usize = 1 << 32; // 1024 ** 4 +const DEFAULT_SIZE: usize = ::max_value(); // 1024 ** 4

wchang (Sat, 04 May 2019 05:51:24 GMT):
so I had 2 questions: 1. What is the strategy for sawtooth once the database/blockchain approaches the limit? Is forking the way to go?

wchang (Sat, 04 May 2019 05:52:22 GMT):
since my machine is 32bit I'll run into this issue alot sooner :)

artem.frantsiian (Sat, 04 May 2019 07:59:13 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=A9gqHDn2g778rsAps) @amundson I created an issue https://jira.hyperledger.org/browse/STL-1532

arsulegai (Sat, 04 May 2019 08:45:47 GMT):
@bobonana determinism in the TP. Your TP is expected to return same result, no matter when a transaction is executed.

robinbryce (Mon, 06 May 2019 08:31:57 GMT):
Has joined the channel.

amundson (Tue, 07 May 2019 19:33:18 GMT):
not - there is now a #transact channel for the new HL project that Sawtooth, Grid, and potentially others will eventually use for transaction processing

amundson (Tue, 07 May 2019 19:33:18 GMT):
note - there is now a #transact channel for the new HL project that Sawtooth, Grid, and potentially others will eventually use for transaction processing

bobonana (Wed, 08 May 2019 17:59:24 GMT):
@arsulegai so why is it that when I switch back to sawtooth v1.0 and leave the TP untouched I don't get this error? I'll definitely look to see if we messed up the TP, but I would imagine if that were the issue then this error would manifest using v1.0 as well?

arsulegai (Thu, 09 May 2019 01:30:45 GMT):
@bobonana it should be same in other versions. Do you've logs to know what's happening?

arsulegai (Mon, 13 May 2019 15:22:04 GMT):
Yes, I am thinking of to not try outbound connections if peering request is rejected because the endpoint is already a peer. I don't know implications of this wrto losing connections though.

arsulegai (Mon, 13 May 2019 15:22:19 GMT):
Any comments on this?

arsulegai (Mon, 13 May 2019 15:36:00 GMT):
(This assumes a small correction in code needed, to not allow peering request pass through if an endpoint is already a peer)

Dan (Mon, 13 May 2019 15:50:31 GMT):
I'm having trouble with your first sentence, could you please reword @arsulegai

arsulegai (Mon, 13 May 2019 15:56:51 GMT):
I am resuming discussion which happened long ago, this is regarding the multiple peering connection issue.

arsulegai (Mon, 13 May 2019 15:57:33 GMT):
For example: https://github.com/hyperledger/sawtooth-core/blob/c209c6458aa9ee616d3a8c60a219f213d0915cb6/validator/sawtooth_validator/gossip/gossip.py#L231 adding extra check here can avoid duplicate peering connections to the same endpoint

arsulegai (Mon, 13 May 2019 16:00:13 GMT):
I thought it's better to not establish a connection itself (do not retry as well) if a peering connection request is rejected because of above reason.

Dan (Mon, 13 May 2019 16:13:09 GMT):
I would think that code should be updated to enforce that `endpoint` is unique in `_peers`. I don't know what you mean about rejected though. If the initiator is rejected does that code still add the connection on the initiator side?

arsulegai (Mon, 13 May 2019 16:18:47 GMT):
Yes, I observed that behavior. I will run it again and confirm

Dan (Mon, 13 May 2019 16:33:42 GMT):
That sounds like two bugs then. 1 to maintain a unique set of endpoints and 2 not to establish a 'half-open' connection.

Dan (Mon, 13 May 2019 16:34:03 GMT):
Thanks for initiating the Proxy RFC.

arsulegai (Mon, 13 May 2019 16:35:31 GMT):
Why would you call it half open connection? The old connection is removed and a new connection request is sent again

arsulegai (Mon, 13 May 2019 18:27:17 GMT):
Would it impact if we stop retrying the new connection if peering connection couldn't be established with a specific reason?

arsulegai (Mon, 13 May 2019 18:35:08 GMT):
The retry logic is here, maybe make it dynamically configurable via settings tp ? https://github.com/hyperledger/sawtooth-core/blob/4b6f63f3e2a0e9518c3f73c025fa2875422ee38b/validator/sawtooth_validator/gossip/gossip.py#L667

arsulegai (Wed, 15 May 2019 06:34:29 GMT):
@amundson ^

GianlucaSchoefer (Thu, 16 May 2019 07:22:18 GMT):
Has joined the channel.

Kirill_Vusik (Thu, 16 May 2019 23:17:55 GMT):

Clipboard - May 17, 2019 2:17 AM

Kirill_Vusik (Thu, 16 May 2019 23:17:55 GMT):

Clipboard - May 17, 2019 2:17 AM

Kirill_Vusik (Thu, 16 May 2019 23:17:55 GMT):

Clipboard - May 17, 2019 2:17 AM

Kirill_Vusik (Thu, 16 May 2019 23:17:55 GMT):

Clipboard - May 17, 2019 2:17 AM

danintel (Thu, 16 May 2019 23:20:36 GMT):
@Kirill_Vusik According to the error message, it's a transient error due to a file update. Have you retried the install?

Kirill_Vusik (Thu, 16 May 2019 23:23:08 GMT):
I have been trying to build the core for one hour. These logs are from `docker-compose build`

Kirill_Vusik (Thu, 16 May 2019 23:58:47 GMT):
docker-compose build --no-cache resolved the issue, sorry for bothering

arsulegai (Mon, 20 May 2019 16:01:19 GMT):
@pschwarz @ltseeley @agunde @amundson Regarding the race condition observed with the patch to fix multiple peering connections, please let us know how do you wish to proceed. Do you have a proposal or shall I start working on it?

ltseeley (Mon, 20 May 2019 16:04:53 GMT):
@arsulegai you're welcome to propose a solution. It'll probably involve some deterministic decision based some property of the connections that both of the nodes have knowledge of.

ltseeley (Mon, 20 May 2019 16:04:53 GMT):
@arsulegai you're welcome to propose a solution. It'll probably involve some deterministic decision based on some property of the connections that both of the nodes have knowledge of.

ltseeley (Mon, 20 May 2019 16:06:36 GMT):
Both nodes know their own endpoint and the other node's endpoint, for instance. We can't rely on timing or some information that's only known to the local validator (like a connection ID).

arsulegai (Mon, 20 May 2019 16:12:09 GMT):
Thinking out loud here, didn't analyze implications in detail. How about sending clue about all connections between validators as part of acknowledgement, sorted wrto connection ids, in case of race condition remove all connections except first one.

ltseeley (Mon, 20 May 2019 16:19:40 GMT):
Still a chance for a race condition; the two nodes could send the clues to each other and still decide on different connections.

ltseeley (Mon, 20 May 2019 16:20:11 GMT):
The same connection can have different IDs on different nodes, I believe

arsulegai (Mon, 20 May 2019 16:48:39 GMT):
Right! Then every node would require to know other node's connection id

ltseeley (Mon, 20 May 2019 17:47:30 GMT):
They would also have to figure out which connection IDs match up

agunde (Mon, 20 May 2019 18:01:54 GMT):
It may be useful to use the connecting validators public key. This would move this check after authorization.

agunde (Mon, 20 May 2019 18:04:54 GMT):
But provide a consistent id to check.

kodonnel (Wed, 22 May 2019 15:40:04 GMT):
Is there a relatively convenient way to get the block_id from the block_num from the client side?

kodonnel (Wed, 22 May 2019 15:40:04 GMT):
Is there a relatively convenient way to get to the block_id from the block_num from the client side?

kodonnel (Wed, 22 May 2019 15:40:04 GMT):
Is there a relatively convenient way to get to the block_id from the block_num from the client side? I've been scanning the various repository and nothing jumps out at me.

amundson (Wed, 22 May 2019 21:11:20 GMT):
@kodonnel in the rest api, you can do GET /blocks

kodonnel (Thu, 23 May 2019 13:00:38 GMT):
I'll have a closer look. I'm hoping for something where I don't have to maintain mapping/cache client side, and also avoid walking back from now to beginning of chain. I'm looking to replay events from a point in the chain, block_num is a bit more convenient for that than the block_id. (bearing in mind forks etc)

kodonnel (Thu, 23 May 2019 13:15:12 GMT):
It's really all about the client_events_subscribe semantics. If you are doing a catch-up subscription right now you need the last_known_block_ids -> implying that you need to persist that somewhere locally in an ordered or linked list, and then you need to accomodate for forks - or assume non-forking. Might be much easier if you could catch-up via block height and then in a separate task detect forks. Even better just be able to query block_id by block_num. This may all be in the rest-api, if so question answered.

pschwarz (Fri, 24 May 2019 15:40:44 GMT):
In grid, as well as other proof-of-concept apps, we store the (block id, block number) tuple in the db. At every event, we check to see if we've seen the block height and perform some fork resolution on the other side. This also doubles as a source of the last_known_block field. Here's a pretty simple example: https://github.com/peterschwarz/state-export-prototype Likewise, in Grid, here's the fork resolution step: https://github.com/hyperledger/grid/blob/master/daemon/src/event/block.rs#L81-L88

mfford (Tue, 28 May 2019 17:51:04 GMT):
Reminder that the Hyperledger Sawtooth Contributor Meeting has been moved to tomorrow, Wednesday, May 29th at 10am CDT. The meeting information can be found on the Hyperledger Community Meetings Calendar located here: https://wiki.hyperledger.org/display/HYP/Calendar+of+Public+Meetings We are finalizing the agenda for this meeting. If you have an appropriate topic you would like to discuss and facilitate, please add it to the agenda, located in the wiki here: https://wiki.hyperledger.org/pages/viewpage.action?pageId=13861698 Looking forward to seeing everyone there! -Mark

mfford (Wed, 29 May 2019 14:54:38 GMT):
Please use this number for the Hyperledger Sawtooth Contributor's Meeting:

mfford (Wed, 29 May 2019 14:54:40 GMT):
https://zoom.us/j/332613493 One tap mobile +16468769923,,332613493# US (New York) +16699006833,,332613493# US (San Jose) Dial by your location +1 646 876 9923 US (New York) +1 669 900 6833 US (San Jose) +1 408 638 0968 US (San Jose) Meeting ID: 332 613 493 Find your local number: https://zoom.us/u/acj8QRhsif

rbuysse (Wed, 29 May 2019 14:58:05 GMT):
^ note, this is different than what was shared previously

Kirill_Vusik (Thu, 30 May 2019 09:11:24 GMT):
Hi everyone, I would like to try fixing this issue: https://jira.hyperledger.org/browse/STL-1209. It is going to take some time for me to go through the contributing process for the first time, so I want to start with this task (fixing a integration test)

Dan (Thu, 30 May 2019 13:54:17 GMT):
cool, thanks @Kirill_Vusik . It looks like @benoit.razet may have already taken care of that https://github.com/hyperledger/sawtooth-core/commit/1246bcfd94aa3fe55be64a8bfc30da3c9eb9fa04 And then that file seems to have subsequently been moved or removed. I haven't found the commit that moves the file but I no longer see it in master.

Dan (Thu, 30 May 2019 13:54:56 GMT):
If you find that the issue is no longer relevant please note the commit that resolves it (possibly the one above) and close the ticket. That alone is a help to keep jira up to date.

Kirill_Vusik (Thu, 30 May 2019 14:02:25 GMT):
Right, it has been resolved already, thank you. I was going to look into it later. I noticed that Need-Review is set, does this label indicate that there is a code review already?

agunde (Thu, 30 May 2019 14:06:10 GMT):
@Kirill_Vusik the "Needs-Review" label indicates that the jira story itself needs to be reviewed to see if it still valid, needs more information, etc. It has been added to all stories that were in the backlog. It is a part of a planned backlog groom.

Kirill_Vusik (Thu, 30 May 2019 14:08:06 GMT):
@agunde, got it, thank you

pschwarz (Thu, 30 May 2019 15:07:04 GMT):
I've marked the issue as "Done"

Kirill_Vusik (Thu, 30 May 2019 15:45:31 GMT):
May I take this one (https://jira.hyperledger.org/browse/STL-998) then? It looks like the sawadm.py is here still

amundson (Thu, 30 May 2019 15:57:24 GMT):
@Kirill_Vusik sure. it's hinted at there, but basically, we need to make sure we have ~100% compatibility between the python and rust sawadm CLIs first ( @rberg2 looked into this not too long ago and might remember the detail). There were a couple options that weren't in the rust one yet (--version,-V or something like that?). Then we need to figure out the packaging for how to include the sawadm ( @rbuysse had thoughts on this previously).

amundson (Thu, 30 May 2019 15:59:11 GMT):
@Kirill_Vusik if you are actively working on it, move the Status to "In Progress" which will make it show up on this board - https://jira.hyperledger.org/secure/RapidBoard.jspa?rapidView=232&quickFilter=620

rberg2 (Thu, 30 May 2019 16:09:08 GMT):
Hello, I found that the rust version of `sawawd genesis` had no console output. and `-v` was missing. Otherwise it is drop in compatible with additional features.

Kirill_Vusik (Thu, 30 May 2019 16:10:33 GMT):
@amundson @rberg2 so removing of the python sawadm might be premature, right? I am looking for a small task (bug/improvement) to work on to understand the commit process

amundson (Thu, 30 May 2019 16:11:59 GMT):
@Kirill_Vusik you could add -v to sawadm genesis (verbose flag) and try and get it to output closer to the python version

amundson (Thu, 30 May 2019 16:12:12 GMT):
(to rust sawadm genesis)

rberg2 (Thu, 30 May 2019 16:12:59 GMT):
This is more of a packaging than a code issue but it would be pretty simple. https://jira.hyperledger.org/browse/STL-1489

amundson (Thu, 30 May 2019 16:13:09 GMT):
working toward using the new sawadm is high-value

Kirill_Vusik (Thu, 30 May 2019 16:17:29 GMT):
@amundson , @rberg2 thank you. @amundson , if you don't mind, I will take STL-1489 first and then I will start with STL-998 and the work related to it (-v, console output)

Kirill_Vusik (Thu, 30 May 2019 16:17:29 GMT):
@amundson , @rberg2 thank you. @amundson , if you don't mind, I will take STL-1489 first and then I will start with the STL-998 and the work related to it (-v, console output)

amundson (Thu, 30 May 2019 16:18:01 GMT):
sounds great!!

rberg2 (Thu, 30 May 2019 16:26:04 GMT):
I created this ticket about the command output. https://jira.hyperledger.org/browse/STL-1498

amundson (Thu, 30 May 2019 18:03:38 GMT):
@rberg2 @rbuysse in a recent discussion, @JonGeater brought up issues with docker-compose compatibility between sawtooth-core and sawtooth-seth. specifically, when you need to build the sawtooth-core and sawtooth-seth images and test them with together. we might want to apply a similar pattern as we are using with the consensus engines to it (where seth doesn't contain the validator images)

JonGeater (Thu, 30 May 2019 18:03:39 GMT):
Has joined the channel.

rbuysse (Thu, 30 May 2019 18:06:09 GMT):
yeah I think so

rbuysse (Thu, 30 May 2019 18:06:22 GMT):
another thing we're gonna want to do is upgrade all the compose files to a current version

pschwarz (Thu, 30 May 2019 20:31:54 GMT):
https://github.com/hyperledger/sawtooth-core/pull/2127 has been merged, which fixes the failing builds. Rebase any outstanding PR's on master.

pschwarz (Thu, 30 May 2019 20:32:17 GMT):
Outstanding sawtooth-core PR's, that is

Pradeep_Pentakota (Mon, 03 Jun 2019 10:28:01 GMT):
Has joined the channel.

JonGeater (Sun, 09 Jun 2019 23:12:13 GMT):
Hi guys, Sorry for going quiet on this, got diverted into some customer action for a week. Anyway, on the point of the Docker-compose files, I'm trying to do a couple of things right now: - Support eth_call (mainly for compatibility with popular things like Go-Ethereum - Investigate some issues we're having with RAFT In order to debug all of this I need to have locally-built bits of -seth, -raft and -core, and while any competent dev can contrive this and make it work I raised the question because this seems to be something we have to tackle with the looming 'pluggable everything' strategy driving through Hyperledger. Sean put the main issue much more concisely than I did, but the sort of issues I'm thinking need a holistic re-design are things like the validator definition in each of the files - which sometimes rebuilds all the rust on every invocation, and/or does it (partially) in --build', and/or re-creates the genesis block every time. Consistency and reproducability would be good in this area. There are also smaller things like strategy over shared/persistent volumes and network specifications when mixing and matching built containers from different places.

JonGeater (Sun, 09 Jun 2019 23:16:54 GMT):
I have some other questions too about being a good citizen in sawtooth contributions. I've got eth_call working solidly now and discovered a couple of quite serious bugs in the utility code (incorrect exceptions being raised, incorrect treatment of data types leading to crashes). Does the community prefer one big PR for "implement eth_call" or creating several tiny issues on Jira and submitting individual PRs for those, followed by the eth_call stuff?

JonGeater (Sun, 09 Jun 2019 23:18:18 GMT):
And lastly: what's the strategy on rust vs python for things like the journal? I've been working on the python but is that going to be deprecated soon? Is this the sort of thing the Wednesday morning (08:30 UK time) call addresses?

Dan (Mon, 10 Jun 2019 14:10:23 GMT):
wow, great list @JonGeater ! Did you know we also have PBFT now? I'm curious if you have some uses that drive more towards Raft? There's been some discussion here in the recent past about connection management. If that feels related to the issues you've seen I can say more there. I'm fine with a big PR so long as it has appropriately structured commits and clear commit messages. In particular I don't personally have a need for jira entries for those defects, again so long as the commit messages explain the 'why'. The Wednesday meeting is @amolk 's. It tends to cover TP implementation for app devs more than core dev concepts. The short answer on the Journal is the center of the validator gets replaced by the new Hyperledger Transact project which is Rust. @ltseeley could say more on the existing journal code but I think even absent Transact the python was getting removed in favor of rust. The core of your docker questions is out of my realm but @rbuysse spends a lot of his time there.

Dan (Mon, 10 Jun 2019 14:10:23 GMT):
wow, great list @JonGeater ! Did you know we also have PBFT now? I'm curious if you have some uses that drive more towards Raft? There's been some discussion here in the recent past about connection management. If that feels related to the issues you've seen I can say more there. I'm fine with a big PR so long as it has appropriately structured commits and clear commit messages. In particular I don't personally have a need for jira entries for those defects, again so long as the commit messages explain the 'why'. The Wednesday meeting is @amolk 's. It tends to cover TP implementation for app devs more than core dev concepts. The short answer on the Journal is the center of the validator gets replaced by the new Hyperledger Transact project which is Rust. @ltseeley could say more on the existing journal code but I think even absent Transact the python was getting removed in favor of rust. The core of your docker questions is out of my realm but @rbuysse

JonGeater (Mon, 10 Jun 2019 14:16:01 GMT):
Thanks @Dan. Yes I know PBFT is in now but one step at a time, we only just got this version stable! We'll move a little later on. Also confirms it's worth maintaining the python for a while so that's good. What's less good is that the compatibility stuff I've implemented for the python version doesn't appear to be present in Rust so we will see a regression when the change happens...is there some overall ticket/issue I can be added to so I can see when this happens and work on the port when the time is right? Maybe a question for @ltseeley ?

JonGeater (Mon, 10 Jun 2019 14:17:22 GMT):
I also forgot the most important of my overarching project questions: the sawtooth-seth stuff depends on a change in sawtooth-core to work properly. What's the strategy for dependency management in these intermingled projects? And RTFM answer is fine, if you can just point me at the correct M.

Dan (Mon, 10 Jun 2019 14:20:12 GMT):
hahaha I think you might have to WTFM ;) I think what I've seen in the past is just referencing the core PR from the depending component PR.

amundson (Mon, 10 Jun 2019 14:23:48 GMT):
@Dan it's not quite accurate that journal will be replaced by transact. journal will use transact (scheduler, executor, context manager, etc. will be replaced). but, generally speaking, everything in the validator will eventually be Rust.

amundson (Mon, 10 Jun 2019 14:26:22 GMT):
@JonGeater bigger PRs mean longer review times. it can be a trade off. but like @Dan said, good commits is important

Dan (Mon, 10 Jun 2019 14:27:00 GMT):
I worded it quite carefully, "...the center of the validator..." ;)

amundson (Mon, 10 Jun 2019 14:27:56 GMT):
if there are small fixes you can get in that aren't part of your larger effort, you should definitely submit them earlier and not bundle unrelated things together

amundson (Mon, 10 Jun 2019 14:30:38 GMT):
@JonGeater the docker stuff in Grid is where we are headed with Docker files for Sawtooth (they have a better approach for Rust code)

Dan (Mon, 10 Jun 2019 14:39:21 GMT):
Does that include Track & Trace? Cuz last time I ran that it was .. expensive.

Dan (Mon, 10 Jun 2019 14:39:47 GMT):
last time ~= 2 weeks ago maybe

amundson (Mon, 10 Jun 2019 14:43:26 GMT):
compiling Rust is expensive (in terms of time). the differences in Grid's Dockerfiles allow for more caching during specific activities of the workflow. if you are just coming to the project having not worked on it recently, you are always going to have to compile the whole thing.

amundson (Mon, 10 Jun 2019 14:43:26 GMT):
compiling Rust is expensive (in terms of time). the differences in Grid's Dockerfiles allow for more caching during specific activities of the dev workflow. if you are just coming to the project having not worked on it recently, you are always going to have to compile the whole thing.

JonGeater (Mon, 10 Jun 2019 14:47:29 GMT):
Thanks @amundson so i'll push the little fixes ASAP

JonGeater (Mon, 10 Jun 2019 15:14:14 GMT):
(To be clear and answer your wording precisely, they were found during my larger effort, and are essential to it being finished, but being changes to utility code they stand alone too and would benefit from broader testing/review)

Dan (Mon, 10 Jun 2019 18:45:36 GMT):
We are likely to get Portuguese doc translations from the Brazil bootcamp. Anyone have thoughts of how we could manage those? Maybe a portugues-docs branch on core for starters.

achenette (Mon, 10 Jun 2019 21:50:50 GMT):
@Dan - This sounds very exciting! I have thoughts. But first, some questions. - Would this be just the docs? - Would/could it include README.md and other *.md files in the repos? - Might it eventually include localization as well as translations for messages, strings, command prompts, and comments? (Note that code comments include, but are not limited to, the SDK doc comments used by javadoc, sphinx, rustdoc, etc.) Maybe even full-blown internationalization? (See the definition link below.) I don't know if a separate branch per language is the best approach, but am willing to believe that it could be pretty good. Perhaps someone knows how other projects and repos handle/organize their translations and localizations? P.S. Definitions for internationalization (i18n) and localization (aka i10n): https://www.w3.org/International/questions/qa-i18n

achenette (Mon, 10 Jun 2019 21:50:50 GMT):
@Dan - This sounds very exciting! I have thoughts. But first, some questions. * Would this be just the docs? * Would/could it include README.md and other *.md files in the repos? * Might it eventually include localization as well as translations for messages, strings, command prompts, and comments? (Note that code comments include, but are not limited to, the SDK doc comments used by javadoc, sphinx, rustdoc, etc.) Maybe even full-blown internationalization? (See the definition link below.) I don't know if a separate branchs per language are the best approach, but am willing to believe that they could be pretty good. Perhaps someone knows how other projects and repos organized translations and localizations? P.S. Definitions for internationalization (i18n) and localization (aka i10n): https://www.w3.org/International/questions/qa-i18n

achenette (Mon, 10 Jun 2019 21:50:50 GMT):
@Dan - This sounds very exciting! I have thoughts. But first, some questions. - Would this be just the docs? - Would/could it include README.md and other *.md files in the repos? - Might it eventually include localization as well as translations for messages, strings, command prompts, and comments? (Note that code comments include, but are not limited to, the SDK doc comments used by javadoc, sphinx, rustdoc, etc.) Maybe even full-blown internationalization? (See the definition link below.) I don't know if a separate branchs per language are the best approach, but am willing to believe that they could be pretty good. Perhaps someone knows how other projects and repos organized translations and localizations? P.S. Definitions for internationalization (i18n) and localization (aka i10n): https://www.w3.org/International/questions/qa-i18n

achenette (Mon, 10 Jun 2019 21:50:50 GMT):
@Dan - This sounds very exciting! I have thoughts. But first, some questions. - Would this be just the docs? - Would/could it include README.md and other *.md files in the repos? - Might it eventually include localization as well as translations for messages, strings, command prompts, and comments? (Note that code comments include, but are not limited to, the SDK doc comments used by javadoc, sphinx, rustdoc, etc.) Maybe even full-blown internationalization? (See the definition link below.) I don't know if a separate branch per language is the best approach, but am willing to believe that it could be pretty good. Perhaps someone knows how other projects and repos organized translations and localizations? P.S. Definitions for internationalization (i18n) and localization (aka i10n): https://www.w3.org/International/questions/qa-i18n

amundson (Mon, 10 Jun 2019 21:52:55 GMT):
@Dan could start as a branch in someone's repo and once we have some content, we can play around with actual organization. My primary concern is how it would be maintained over time.

achenette (Mon, 10 Jun 2019 21:56:13 GMT):
:heavy_plus_sign:

achenette (Mon, 10 Jun 2019 21:58:05 GMT):
Ongoing translation of large doc sets is usually handled by a Content Management System backed by a content-control database. But that stuff is very expensive and time-consuming to implement. (It pays for itself in a large company that does lots of translations, but Open Source isn't like that.)

amundson (Mon, 10 Jun 2019 21:59:04 GMT):
I've only done this w/gettext before, so that was just strings not larger docs.

achenette (Mon, 10 Jun 2019 22:03:37 GMT):
There seems to be something called Poedit that works with gettext. Again, just strings, not full docs. Documentation (aka human language) is complicated.

achenette (Mon, 10 Jun 2019 22:03:37 GMT):
There seems to be something called Poedit that works with gettext. Again, just strings, not full docs. Documentation (aka human language) is complicated.

achenette (Mon, 10 Jun 2019 22:03:43 GMT):
https://poedit.net/

achenette (Mon, 10 Jun 2019 22:04:47 GMT):
Poedit's docs are translated by human beans. :-) https://crowdin.com/project/poedit

achenette (Mon, 10 Jun 2019 22:06:43 GMT):
Kubernetes has a nice guy on how they organize their doc/website translations: https://kubernetes.io/docs/contribute/localization/#translating-content (Putting this URL here so I don't forget it)

achenette (Mon, 10 Jun 2019 22:06:43 GMT):
Kubernetes has a nice guide on how they organize their doc/website translations: https://kubernetes.io/docs/contribute/localization/#translating-content (Putting this URL here so I don't forget it)

rjones (Tue, 11 Jun 2019 18:43:12 GMT):
It looks like Dan Mack hasn't made a commit to Sawtooth this year. He has a commit bit and no 2FA. Is he still active, or no?

rjones (Tue, 11 Jun 2019 18:43:12 GMT):
It looks like Dan Mack hasn't made a commit to Sawtooth this year. He has a commit bit and no 2FA. Is he still active, or no? https://github.com/danmack

amundson (Tue, 11 Jun 2019 18:46:13 GMT):
@rjones I'll DM you his email address if you want to try and reach out to him

rjones (Tue, 11 Jun 2019 18:46:40 GMT):
thanks, that would be nice.

arsulegai (Thu, 13 Jun 2019 08:48:43 GMT):
Hi all, I was away on personal leave and missed the recent Hyperledger Sawtooth contributor's meeting. Is there a MOM which I can read through?

mfford (Thu, 13 Jun 2019 14:01:59 GMT):
@arsulegai For the 5/29 session, meeting notes were not posted, as the meeting itself was adjourned early due to low attendance (in part because of the US holiday reschedule). The next Hyperledger Sawtooth Contributor Meeting will be back to our regular schedule cadence on June 24th at 10am CDT.

arsulegai (Thu, 13 Jun 2019 14:24:53 GMT):
Oh! Ok

Dan (Thu, 13 Jun 2019 15:01:34 GMT):
when you weren't there no one else wanted to be there either ;D

arsulegai (Thu, 13 Jun 2019 16:22:07 GMT):
:sweat_smile:

bryangross (Fri, 14 Jun 2019 00:15:30 GMT):
Has joined the channel.

patelkishan (Mon, 17 Jun 2019 19:47:30 GMT):
Has joined the channel.

patelkishan (Mon, 17 Jun 2019 19:47:32 GMT):
Hello everyone, I am interested in benchmarking sawtooth-pbft using caliper so can someone guide me through the process?Thank you!

Dan (Wed, 19 Jun 2019 13:18:02 GMT):
@danintel @dplumb @achenette There's some PRs growing stale in simple supply. Could you guys coordinate to get them resolved?

dplumb (Wed, 19 Jun 2019 16:17:24 GMT):
I commented on them a while ago, they didn't seem to be the right solution to me

Dan (Wed, 19 Jun 2019 19:42:27 GMT):
So the central issue seems to be the protoc built for repo.sawtooth.me? @danintel can you clarify further?

Dan (Wed, 19 Jun 2019 19:47:19 GMT):
There may be a meta issue here around the navigability? of repo.sawtooth.me. It's hard to tell what's available.

rbuysse (Thu, 20 Jun 2019 13:31:44 GMT):
the repo isn't directly browsable, but you can see what packages the repository provides

rbuysse (Thu, 20 Jun 2019 13:33:28 GMT):
using bumper/stable as an example, if you have the repo in your /etc/apt/sources.list

rbuysse (Thu, 20 Jun 2019 13:33:34 GMT):
navigate to /var/lib/apt/lists

rbuysse (Thu, 20 Jun 2019 13:34:25 GMT):
the available packages are in an lz4 compressed file repo.sawtooth.me_ubuntu_bumper_stable_dists_xenial_universe_binary-amd64_Packages.lz4

rbuysse (Thu, 20 Jun 2019 13:34:52 GMT):
you can use the lz4cat utility to view the contents

rbuysse (Thu, 20 Jun 2019 13:34:57 GMT):
`lz4cat repo.sawtooth.me_ubuntu_bumper_stable_dists_xenial_universe_binary-amd64_Packages.lz4`

danintel (Thu, 20 Jun 2019 18:14:23 GMT):
Basically, there is no acceptable solution for fixing Simple Supply Chain so it will work. The current source doesn't work and has never worked since code complete. It uses Sawtooth 1.0, which depends on the internal sawtooth.me repo, which is corrupt for Sawtooth 1.0. That is sawtooth.me for ST 1.0 has has a protoc and proto library that are mismatched and causes Simple Supply Chain to always fail. Working around it (the PR) is not acceptable. Upgrading to use ST 1.1 is not acceptable. Fixing sawtooth.me is not acceptable. Nothing is acceptable :neutral_face:

dplumb (Thu, 20 Jun 2019 18:56:24 GMT):
Hi @danintel , what are your goals with Simple Supply right now? Definitely not trying to be a blocker on anything. My concerns are that it doesn't seem ideal to update Simple Supply to 1.1, since the app was created to support the Sawtooth App Dev course which is still based on 1.0. Regarding the protobuf issue, it is not a good practice to manually edit the generated protos, so another solution would be ideal. Let me know what you think.

Dan (Thu, 20 Jun 2019 18:59:38 GMT):
it sounds like the resolution would be to correct the protoc in the 1.0 apt repo. I don't understand the problem though. Is it too new or too old or too something else?

danintel (Fri, 21 Jun 2019 01:37:59 GMT):
The protoc compiler is newer than the library....which is not suppprted. The library can be newer bur not the compiler. So protoc generates stuff unknown to the library.

danintel (Fri, 21 Jun 2019 01:39:12 GMT):
As far as 1.0 vs. 1.1 not much as changed from a API perspective... Just the internal implementation

mfford (Fri, 21 Jun 2019 13:57:54 GMT):
REMINDER: The next Hyperledger Sawtooth Contributor Meeting is on Monday, June 24th at 10am CDT. The meeting information can be found on the Hyperledger Community Meetings Calendar located here: https://wiki.hyperledger.org/display/HYP/Calendar+of+Public+Meetings We are finalizing the agenda for this meeting. If you have an appropriate topic you would like to discuss and facilitate, please add it to the agenda, located in the wiki here: https://wiki.hyperledger.org/pages/viewpage.action?pageId=13865440 Looking forward to seeing everyone there! -Mark

amundson (Fri, 21 Jun 2019 17:23:28 GMT):
@patelkishan most of us haven't used caliper, though I think some of its initial code was inspired by some of our workload generators from years ago. most of load testing is done with smallbank or intkey workload generators that are part of sawtooth. The code for that is in https://github.com/hyperledger/sawtooth-core/tree/master/perf . Setting up a cluster to test is usually best one on bare metal or in AWS. Docs for PBFT are here - https://sawtooth.hyperledger.org/docs/pbft/nightly/master/ Docs for installing Sawtooth (includes PBFT information) - https://sawtooth.hyperledger.org/docs/core/nightly/master/app_developers_guide/ubuntu_test_network.html

amolk (Fri, 21 Jun 2019 20:34:11 GMT):
@patelkishan If you'll ping @rkrish82 about Caliper, he might be able to help. He uses Caliper regularly.

mfford (Mon, 24 Jun 2019 12:36:52 GMT):
REMINDER: The Hyperledger Sawtooth Contributor Meeting is today, Monday, June 24th at 10am CDT. The meeting information can be found on the Hyperledger Community Meetings Calendar located here: https://wiki.hyperledger.org/display/HYP/Calendar+of+Public+Meetings The agenda for today's meeting is located in the wiki here: https://wiki.hyperledger.org/pages/viewpage.action?pageId=13865440

mfford (Mon, 24 Jun 2019 12:36:52 GMT):
REMINDER: The Hyperledger Sawtooth Contributor Meeting is today, Monday, June 24th at 10am CDT. PLEASE USE THIS LINK FOR THE MEETING: Join Zoom Meeting https://zoom.us/j/438462056 One tap mobile +16468769923,,438462056# US (New York) +14086380968,,438462056# US (San Jose) Dial by your location +1 646 876 9923 US (New York) +1 408 638 0968 US (San Jose) +1 669 900 6833 US (San Jose) Meeting ID: 438 462 056 Find your local number: https://zoom.us/u/acj8QRhsif The agenda for today's meeting is located in the wiki here: https://wiki.hyperledger.org/pages/viewpage.action?pageId=13865440

danintel (Mon, 24 Jun 2019 15:02:14 GMT):
I see... a new meeting ID for this month

JonGeater (Mon, 24 Jun 2019 19:32:54 GMT):
Yes, the calendar needs to be updated. I got kicked off the call twice before I noticed this

mfford (Mon, 24 Jun 2019 19:38:48 GMT):
The calendar was updated prior to the meeting this morning to fix the broken zoom. Should be good for future meetings.

lucgerrits (Tue, 25 Jun 2019 15:51:48 GMT):
Has joined the channel.

patelkishan (Wed, 26 Jun 2019 05:27:28 GMT):
I am running a sawtooth network in docker containers. https://sawtooth.hyperledger.org/docs/core/nightly/master/sysadmin_guide/grafana_configuration.html gives instructions for Ubuntu environment. How can I use Grafana if I set up the environment using only the docker-compose file of sawtooth-pbft?

arsulegai (Thu, 27 Jun 2019 06:27:59 GMT):
Isn't this a dead code or am I missing something here https://github.com/hyperledger/sawtooth-core/blob/1591798b766cdea03940f0d994d8614c5ec69b34/validator/src/journal/candidate_block.rs#L415 ?

arsulegai (Thu, 27 Jun 2019 19:09:34 GMT):
Need help in another part of the code, could somebody please help me understand more about the comment here https://github.com/hyperledger/sawtooth-core/blob/bb481f279c5e48c7c4ecd2d06b3a093743fce365/validator/sawtooth_validator/execution/scheduler_parallel.py#L540 ?

amundson (Thu, 27 Jun 2019 20:21:37 GMT):
@arsulegai I think that's just to prevent unscheduling all batches. This unschedule_incomplete_batches() thing won't get carried over into Transact.

amundson (Thu, 27 Jun 2019 20:24:03 GMT):
@arsulegai by dead code, do you mean never executed?

arsulegai (Fri, 28 Jun 2019 00:40:07 GMT):
@amundson No, I meant that it's a local variable and because of return statement this assignment is not useful.

arsulegai (Fri, 28 Jun 2019 12:28:52 GMT):
@amundson Could there be a corner case that this unschedule mechanism has left behind a batch which can never be completed, thus indefinitely delaying the candidate_block to complete the scheduled execution?

arsulegai (Fri, 28 Jun 2019 12:29:55 GMT):
I see that before adding batches to the candidate_block itself, there are checks to not consider a batch before all its dependencies are added. So failed to prove above statement

arsulegai (Fri, 28 Jun 2019 12:41:56 GMT):
The reason I went behind this path is because I consistently see unschedule happening when there's indefinite wait at validator

arsulegai (Fri, 28 Jun 2019 18:57:54 GMT):
Any help here, please?

pschwarz (Fri, 28 Jun 2019 19:42:33 GMT):
@arsulegai I think the only way that a batch would never be completed during execution would have to do with the Transaction Processor side of things

pschwarz (Fri, 28 Jun 2019 19:43:10 GMT):
If a transaction processor never returns, but still responds to pings from the validator, the outstanding transaction won't set the value.

pschwarz (Fri, 28 Jun 2019 19:43:54 GMT):
If it doesn't respond to pings, the connection will be closed and the transaction rescheduled

pschwarz (Fri, 28 Jun 2019 19:44:04 GMT):
(Due to internal error)

pschwarz (Fri, 28 Jun 2019 19:45:25 GMT):
The unscheduling happens when the consensus engine tells the publisher to publish the block

arsulegai (Fri, 28 Jun 2019 19:48:50 GMT):
The issue is seen with intkey workload or even the smallbank workload.. Consensus engine times out waiting for response to summarize_block call.

arsulegai (Fri, 28 Jun 2019 19:53:14 GMT):
But I agree with you @pschwarz that there's no such corner case possible for indefinite wait here

arsulegai (Fri, 28 Jun 2019 19:54:52 GMT):
There's another possible path for me to dig deeper, in all the error logs with me there's one more common pattern before this time out error

arsulegai (Fri, 28 Jun 2019 19:56:34 GMT):
That is the node that fails to summarize and respond back, receives the block from another validator the same time when its own consensus engine finalizes the block

arsulegai (Fri, 28 Jun 2019 19:58:17 GMT):
This sounds like a normal scenario and still i doubt if I'll get meaningful analysis by digging more in this path

arsulegai (Mon, 01 Jul 2019 05:02:37 GMT):
Is this a possible bug that's fixed https://github.com/hyperledger/sawtooth-core/commit/358be02cb7f32d914b03e2bce3017a1c929cde41 ? @ltseeley

ltseeley (Mon, 01 Jul 2019 15:20:32 GMT):
I'm not sure I understand the question?

arsulegai (Mon, 01 Jul 2019 15:29:43 GMT):
The git comment says it's to avoid adding duplicate batch to the candidate_block, I wanted to know if there were issues with it

arsulegai (Wed, 03 Jul 2019 05:08:17 GMT):
Question: Is it required that the validator sends the duplicate block to the consensus engine (particularly a block which is already committed)?

ltseeley (Mon, 08 Jul 2019 13:56:53 GMT):
The validator should not send duplicate blocks to consensus; it should only send each block to consensus once, and they should be in order.

arsulegai (Mon, 08 Jul 2019 14:21:32 GMT):
How about in case of gossip network, when blocks are sent out?

arsulegai (Mon, 08 Jul 2019 14:23:10 GMT):
I have a log where same block is received by the consensus engine at least twice

ltseeley (Mon, 08 Jul 2019 15:10:14 GMT):
What version of the validator are you using? What's the scenario that produces that behavior?

arsulegai (Mon, 08 Jul 2019 15:53:12 GMT):
Got it while testing PoET with Validator 1.2

arsulegai (Mon, 08 Jul 2019 15:53:38 GMT):
It was LR setup where this scenario is observed

jsmitchell (Mon, 08 Jul 2019 15:59:37 GMT):
do you have snippets of the validator and consensus engine logs around that event?

arsulegai (Mon, 08 Jul 2019 16:24:28 GMT):

poet-engine-debug.log

arsulegai (Mon, 08 Jul 2019 16:24:50 GMT):

validator-debug.log

arsulegai (Mon, 08 Jul 2019 16:24:51 GMT):
Attached debug logs for PoET and Validator, you may consider the block 37e54461523b486fa8fe9b4e06da2c8cdbd8ebe3339478e4f0f051030c5b192f370551690206153241524062ae9e24d240c34f5055ec9cd53f40c71a10853f28 for example.

arsulegai (Mon, 08 Jul 2019 16:25:13 GMT):
This duplicate arrival caused another issue in PoET for which a PR is raised on sawtooth-poet

ltseeley (Mon, 08 Jul 2019 17:06:07 GMT):
Hmm, I have not seen that before. It looks like that block is getting validated twice, so my guess is that's a bug in the completer.

arsulegai (Mon, 08 Jul 2019 18:21:54 GMT):
I am surprised that it's consistent in all PoET LRs

pschwarz (Tue, 09 Jul 2019 14:52:37 GMT):
Added a new RFC for an Event Processor SDK API: https://github.com/hyperledger/sawtooth-rfcs/pull/48

rjones (Tue, 09 Jul 2019 17:02:12 GMT):
Has left the channel.

arsulegai (Wed, 10 Jul 2019 14:01:21 GMT):
@ltseeley Could it be that same block is scheduled twice for validation, without checking if it's already present in it?

AlexanderZhovnuvaty (Wed, 10 Jul 2019 14:48:38 GMT):
Has joined the channel.

arsulegai (Wed, 10 Jul 2019 16:03:58 GMT):
Any help here is useful, I would like to understand if there's behavior changes in Validator from 1.1, we see more forking in 1.2 candidate than 1.1 with same PoET binary

Dan (Wed, 10 Jul 2019 17:00:51 GMT):
Based on a call I just had with @amolk I understand that master is not exhibiting the same forking? As in #sawtooth-release can you specify which build is exhibiting the forking?

ltseeley (Wed, 10 Jul 2019 18:44:01 GMT):
It's possible, but I can't say for sure.

Dan (Wed, 10 Jul 2019 19:24:42 GMT):
I haven't tracked the journal oxidation closely. Is the python completer still the active code? (That hasn't seen a change in nearly a year) or is there a rust replacement? I don't see an obvious one here: https://github.com/hyperledger/sawtooth-core/tree/master/validator/src/journal

ltseeley (Wed, 10 Jul 2019 20:58:32 GMT):
Completer is still Python

arsulegai (Mon, 15 Jul 2019 09:57:48 GMT):
Any information on why would validator take too long to process a block?

arsulegai (Mon, 15 Jul 2019 09:58:20 GMT):
Example: t1: Adding block XYZ for processing t1 + 6 min: Block XYZ passed validation

ltseeley (Mon, 15 Jul 2019 14:13:23 GMT):
How many transactions are in the block?

arsulegai (Mon, 15 Jul 2019 14:14:19 GMT):
1 txn / 1 batch, max batches in a block set to 1000

ltseeley (Mon, 15 Jul 2019 14:24:26 GMT):
Hmm, what's CPU usage look like while it's processing the block? Is the validator in question in-sync with the rest of the network?

arsulegai (Mon, 15 Jul 2019 14:31:39 GMT):
Nodes are in sync, all the nodes seems to have slowed themselves down because of block validation time

arsulegai (Mon, 15 Jul 2019 14:32:45 GMT):
There's no other activity happening except waiting for block validator be triggered. Because this is slow, all the nodes are trying to publish a block and more number of forks seen. Eventually making validators validate more blocks as time progresses

arsulegai (Mon, 15 Jul 2019 14:35:10 GMT):
Does it wait on a thread to be released or an event? I couldn't make this out from code

ltseeley (Mon, 15 Jul 2019 14:53:24 GMT):
Do the TP logs show anything interesting?

arsulegai (Mon, 15 Jul 2019 15:06:18 GMT):
Nope :(

arsulegai (Mon, 15 Jul 2019 15:07:17 GMT):
We then started playing with different configuration options, so that it can smooth with 10tps

arsulegai (Mon, 15 Jul 2019 15:18:58 GMT):
Is there a possibility that validator is waiting on interconnect thread?

arsulegai (Mon, 15 Jul 2019 15:19:32 GMT):
I see that one of the validator got disconnected and this one is printing too many "unable to send NETWORK_ACK" messages

pschwarz (Tue, 16 Jul 2019 17:18:30 GMT):
@arsulegai Could you add a link to the master PR in your backport PR?

arsulegai (Tue, 16 Jul 2019 17:24:29 GMT):
Ok

pschwarz (Tue, 16 Jul 2019 17:28:53 GMT):
thanks

pschwarz (Tue, 16 Jul 2019 17:29:03 GMT):
Makes it easier for mental correlation :)

arsulegai (Tue, 16 Jul 2019 17:46:51 GMT):
Agree

danintel (Mon, 22 Jul 2019 15:01:29 GMT):
Trying to find the valid meeting ID for now. It is always changing!

jsmitchell (Mon, 22 Jul 2019 15:01:53 GMT):
https://zoom.us/j/438462056

danintel (Mon, 22 Jul 2019 15:02:13 GMT):
ty

jsmitchell (Mon, 22 Jul 2019 16:02:21 GMT):
@arsulegai let's pick that conversation up here

jsmitchell (Mon, 22 Jul 2019 16:03:28 GMT):
what log messages are associated with the event where the block predecessor is determined to be missing and re-added to the pending queue?

arsulegai (Mon, 22 Jul 2019 16:03:32 GMT):
I was talking about the time interval between "block XYZ added for processing" and "block XYZ validated"

arsulegai (Mon, 22 Jul 2019 16:03:54 GMT):
Oh! re-adding to pending queue is different issue

jsmitchell (Mon, 22 Jul 2019 16:05:43 GMT):
the re-adding to pending queue issue is the one you had up on your screen, right?

arsulegai (Mon, 22 Jul 2019 16:06:19 GMT):
Yes, then towards the end of the call I brought up the timing issue

arsulegai (Mon, 22 Jul 2019 16:06:30 GMT):
Ok, let's pick up re-adding to pending queue first

jsmitchell (Mon, 22 Jul 2019 16:06:54 GMT):
do you have the raw log message line for when it decided that the already added block was missing?

arsulegai (Mon, 22 Jul 2019 16:07:25 GMT):
These traces were from the same log I posted few days ago

arsulegai (Mon, 22 Jul 2019 16:07:45 GMT):
Here it is https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=Km7eKH4urMo8yKxYK

jsmitchell (Mon, 22 Jul 2019 16:07:56 GMT):
I just want that one message

arsulegai (Mon, 22 Jul 2019 16:09:36 GMT):
I didn't get you, how about if I point you out the log traces that I am referring to

arsulegai (Mon, 22 Jul 2019 16:09:58 GMT):
Give me couple of minutes, let me pull them up from original log file

jsmitchell (Mon, 22 Jul 2019 16:12:46 GMT):
Is this the log message in question: https://github.com/hyperledger/sawtooth-core/blob/master/validator/src/journal/block_scheduler.rs#L127

arsulegai (Mon, 22 Jul 2019 16:15:56 GMT):
Yes

arsulegai (Mon, 22 Jul 2019 16:16:30 GMT):
I didn't expect this trace when the previous block is in pending queue

jsmitchell (Mon, 22 Jul 2019 16:16:48 GMT):
@pschwarz looks like status can be https://github.com/hyperledger/sawtooth-core/blob/master/validator/src/journal/block_wrapper.rs#L25, but https://github.com/hyperledger/sawtooth-core/blob/master/validator/src/journal/block_scheduler.rs#L177 must be returning BlockStatus::Unknown

pschwarz (Mon, 22 Jul 2019 16:17:44 GMT):
Hmm

arsulegai (Mon, 22 Jul 2019 16:18:02 GMT):
Before going there, can there be any intermittent state where block is not in pending or in processing?

jsmitchell (Mon, 22 Jul 2019 16:18:18 GMT):
it's not clear to me where stuff is inserted into the BlockStatusStore

jsmitchell (Mon, 22 Jul 2019 16:18:53 GMT):
Missing != Unknown

jsmitchell (Mon, 22 Jul 2019 16:23:50 GMT):
getting to that log message means that these checks must have failed: https://github.com/hyperledger/sawtooth-core/blob/master/validator/src/journal/block_scheduler.rs#L105 and 9 lines later

arsulegai (Mon, 22 Jul 2019 16:26:14 GMT):
Can this happen? Z -> A -> B -> C (is a fork) Z-> A -> D (is another fork) 1. Block A is added to pending because Z is in process 2. Block D arrives and then we see the trace you pointed (A's status is unknown). So A is added to processing. ---> I expected A to be in pending queue here 3. Block A passed validation --> New Block Event triggered to consensus engine 4. Chain head updated to A 5. Again Block A passed validation --> New Block Event triggered to consensus engine

jsmitchell (Mon, 22 Jul 2019 16:28:06 GMT):
@pschwarz all access to the BlockSchedulerState is controlled by the BlockScheduler?

jsmitchell (Mon, 22 Jul 2019 16:28:51 GMT):
because both of the methods that modify the state are managed by a mutex in BlockScheduler

pschwarz (Mon, 22 Jul 2019 16:32:47 GMT):
Looks like

arsulegai (Mon, 22 Jul 2019 16:33:35 GMT):

ref-validator-debug.log

jsmitchell (Mon, 22 Jul 2019 16:37:52 GMT):
is this easily reproducible @arsulegai ?

jsmitchell (Mon, 22 Jul 2019 16:38:36 GMT):
try adding a log line to https://github.com/hyperledger/sawtooth-core/blob/master/validator/src/journal/block_scheduler.rs#L195

jsmitchell (Mon, 22 Jul 2019 16:39:28 GMT):
to see when the processing and pending queues are modified for the missing block

arsulegai (Mon, 22 Jul 2019 16:39:35 GMT):
Before the PoET patch, we could make out this happened seeing either the stalled network or KeyError issue. But now difficult to know for which block it happens, but yes it happens always

arsulegai (Mon, 22 Jul 2019 16:43:54 GMT):
Quick question: https://github.com/hyperledger/sawtooth-core/blob/0e5f143fcfff7046d3647a042861b46b19ebc9b3/validator/src/journal/block_scheduler.rs#L197 what is in this list?

jsmitchell (Mon, 22 Jul 2019 16:46:00 GMT):
It's not obvious to me that this is the right place to log for the update to the block status store (the thing that's returning Unknown), but I think this is it: https://github.com/hyperledger/sawtooth-core/blob/master/validator/src/journal/block_validator.rs#L372

jsmitchell (Mon, 22 Jul 2019 16:50:45 GMT):
it looks like a list of the immediate children of that block

jsmitchell (Mon, 22 Jul 2019 16:50:58 GMT):
(in the case of forks, there could be multiple immediate children)

jsmitchell (Mon, 22 Jul 2019 16:51:46 GMT):
that pending.remove doesn't seem right though....

jsmitchell (Mon, 22 Jul 2019 16:51:56 GMT):
https://github.com/hyperledger/sawtooth-core/blob/0e5f143fcfff7046d3647a042861b46b19ebc9b3/validator/src/journal/block_scheduler.rs#L202

jsmitchell (Mon, 22 Jul 2019 16:53:30 GMT):
it's returning the list of ready blocks

jsmitchell (Mon, 22 Jul 2019 16:54:08 GMT):
which ends up here: https://github.com/hyperledger/sawtooth-core/blob/0e5f143fcfff7046d3647a042861b46b19ebc9b3/validator/src/journal/block_validator.rs#L341

jsmitchell (Mon, 22 Jul 2019 16:56:30 GMT):
which goes over these channels

arsulegai (Mon, 22 Jul 2019 17:02:50 GMT):
Should they be added to processing before sent?

jsmitchell (Mon, 22 Jul 2019 17:04:49 GMT):
i'm not sure I understand the semantics of those different queues

jsmitchell (Mon, 22 Jul 2019 17:06:08 GMT):
but, it does seem like the status of the dependent blocks is lost when done() is called and those dependent blocks are removed from pending

jsmitchell (Mon, 22 Jul 2019 17:07:53 GMT):
at least while they are on these channels

arsulegai (Mon, 22 Jul 2019 17:09:38 GMT):
These channels are for block validation right?

arsulegai (Mon, 22 Jul 2019 17:12:21 GMT):
Looking at those log traces and the code traces, I think the descendant blocks before sent for validation shall be added to processing queue

arsulegai (Mon, 22 Jul 2019 17:13:56 GMT):
We remove blocks from pending -> consider them to be ready for processing -> miss out adding them to processing queue!

jsmitchell (Mon, 22 Jul 2019 17:14:08 GMT):
@pschwarz do you understand the intent behind this processing queue?

jsmitchell (Mon, 22 Jul 2019 17:14:08 GMT):
@pschwarz do you understand the intent behind this processing ~queue~ list?

arsulegai (Mon, 22 Jul 2019 17:22:32 GMT):
Taking Z, A, B, C, D examples I posted earlier

arsulegai (Mon, 22 Jul 2019 17:22:44 GMT):
https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=FoTKYXaz6EMjYLrcS

arsulegai (Mon, 22 Jul 2019 17:23:06 GMT):
When Z is done processing and passes validation.. A is removed from the pending queue

arsulegai (Mon, 22 Jul 2019 17:23:16 GMT):
D arrives, doesn't know A's status yet

arsulegai (Mon, 22 Jul 2019 17:25:53 GMT):
Where as A was silently sent to validate earlier, now because of D the block A is again added to processing queue

arsulegai (Mon, 22 Jul 2019 17:37:19 GMT):
Continuing this, if A completed validation process before the processing queue is processed further, the issue would have been masked

arsulegai (Mon, 22 Jul 2019 17:38:16 GMT):
What we see is that block A was added to processing already (sent for validation) before that

jsmitchell (Mon, 22 Jul 2019 21:26:08 GMT):
@arsulegai we think we have a fix for this. @pschwarz is building/running tests and we'll probably try to get a build in an LR environment tomorrow.

arsulegai (Mon, 22 Jul 2019 23:22:52 GMT):
Cool! Could you please keep me in synch when the PR is ready?

pschwarz (Tue, 23 Jul 2019 16:25:57 GMT):
Here's the PR: https://github.com/hyperledger/sawtooth-core/pull/2217

arsulegai (Tue, 23 Jul 2019 17:36:45 GMT):
Is the scenario where processing queue already has that block taken care?

pschwarz (Tue, 23 Jul 2019 18:21:32 GMT):
No, because it wouldn't be in the descendent block list

pschwarz (Tue, 23 Jul 2019 18:22:53 GMT):
They can't be in processing if they have an ancestor that hasn't been processed yet

arsulegai (Wed, 24 Jul 2019 02:27:26 GMT):
Right

jsmitchell (Wed, 24 Jul 2019 14:26:21 GMT):
@arsulegai do you have log examples of this delay in processing/notification you were talking about? Do they still happen on a build with PR 2217?

arsulegai (Wed, 24 Jul 2019 14:30:55 GMT):
We didn't disturb our earlier LR setup, it's going on for now way beyond LR7. I will see if we can squeeze in another setup to run a LR with PR 2217.

arsulegai (Wed, 24 Jul 2019 14:31:48 GMT):
About the log example from earlier runs, let me check if I have one available to share

arsulegai (Wed, 24 Jul 2019 14:59:05 GMT):

gaia-desktop-validator-debug.log

jsmitchell (Wed, 24 Jul 2019 15:02:30 GMT):
is there a timestamp in particular?

arsulegai (Wed, 24 Jul 2019 15:05:28 GMT):
Interesting timestamps: A block passed validation at 14:58:17.193 the next one was at 15:08:48.405

arsulegai (Wed, 24 Jul 2019 15:06:56 GMT):
Another: a block passed validation at 15:16:38.865 and the next one was at 15:25:30.995

jsmitchell (Wed, 24 Jul 2019 15:07:31 GMT):
have you done any inspection of what the poet timers were in these cases?

jsmitchell (Wed, 24 Jul 2019 15:08:13 GMT):
we've seen cases where all the nodes will get quite unlucky and you'll get 5+ minute inter block times

arsulegai (Wed, 24 Jul 2019 15:08:36 GMT):
There was a case where one validator got away from rest of the network, it started producing longer wait times. This in SIM mode

arsulegai (Wed, 24 Jul 2019 15:09:18 GMT):
Yes, I have seen cases of longer wait times

arsulegai (Wed, 24 Jul 2019 15:10:45 GMT):
It's been a week or so since I last checked it, but there was a case where PoET completed the block creation process. It's now waiting for BLOCK NEW from the validator, it however didn't receive one for a long time (6~7 min)

jsmitchell (Wed, 24 Jul 2019 15:11:08 GMT):
ok, if you can find one of those instances, please point it out in the logs

arsulegai (Wed, 24 Jul 2019 15:11:50 GMT):
I don't recall which log file was that, but observation that time was I saw many validator-validator re-connection traces.

arsulegai (Wed, 24 Jul 2019 15:12:40 GMT):
Did the team over there start a LR with this patch?

jsmitchell (Wed, 24 Jul 2019 15:13:50 GMT):
yeah

jsmitchell (Wed, 24 Jul 2019 15:14:05 GMT):
no instances of rescheduling already scheduled blocks

arsulegai (Wed, 24 Jul 2019 15:16:47 GMT):
What are the config settings for PoET for the run?

jsmitchell (Wed, 24 Jul 2019 15:17:30 GMT):
10 nodes, 10 tps intkey, and I'm guessing 30 second target_wait_time and 300 second initial_wait_time

arsulegai (Wed, 24 Jul 2019 15:18:33 GMT):
That's great, the only reason we made it 10 second target_wait_time with 200 as max_batches_per_block was to avoid long gap between two block validations

arsulegai (Wed, 24 Jul 2019 15:19:15 GMT):
BTW for the validator logs which I posted, here's the trace from PoET which shows that there's 5 min between block finalization and CONSENSUS NEW BLOCK event

arsulegai (Wed, 24 Jul 2019 15:19:30 GMT):
[15:03:29.201 [MainThread] engine INFO] Published block 84eb268d69288fabd79e43ea2e183573178ab207e32b2ad959d37945c94fa74b5114720f233a626a68e71972a586937baa609f388d1777df68a5bbaa574604ce [15:08:49.421 [MainThread] engine DEBUG] Received message: CONSENSUS_NOTIFY_BLOCK_NEW

arsulegai (Wed, 24 Jul 2019 15:20:33 GMT):
PoET wait timer value is reasonable. It's around 40~50 seconds for the blocks around this time.

jsmitchell (Wed, 24 Jul 2019 15:21:07 GMT):
it's a distribution

arsulegai (Wed, 24 Jul 2019 15:25:57 GMT):
A question, because I wasn't there earlier. How did we arrive at the time 30sec target_wait_time and 300sec initial_wait_time? Why not just any random time? Were there differences in the behavior with size of the network and time values?

jsmitchell (Wed, 24 Jul 2019 15:47:55 GMT):
math + experience

jsmitchell (Wed, 24 Jul 2019 15:48:06 GMT):
10 second target wait time is going to cause a lot of forks

jsmitchell (Wed, 24 Jul 2019 15:48:47 GMT):
you are shifting the random distribution such that it makes it highly likely that two or more validators will publish what they think are winning blocks within the block validation duration

arsulegai (Wed, 24 Jul 2019 15:50:19 GMT):
Hmm

Dan (Wed, 24 Jul 2019 15:57:23 GMT):
@rberg2 @rbuysse I'm close to calling for FCP on https://github.com/hyperledger/sawtooth-rfcs/pull/45 but would like your reviews first.

arsulegai (Wed, 24 Jul 2019 16:17:03 GMT):
Hello all, I was working on a tool and it helped me in analyzing logs faster. The tool specifically is for the log analysis of those software that are state machine based. *For example:* 1. To check if consensus engine is working as expected, if it is handling all the blocks gracefully, if all the responses from validator received at the consensus engine. 2. The case which we just solved (sending duplicate blocks to the consensus engine). Identify which blocks are validated twice and sent to consensus engine, without peeping into huge log files. Here's the GitHub link for the tool https://github.com/arsulegai/state-checker I welcome feedback from you

arsulegai (Wed, 24 Jul 2019 16:20:34 GMT):
Also, I see @Dan has debug tools specifically designed for the PoET. It would be nice to have all such tools listed in a place. I request you to share your ways of debugging or the tools you have which can help to move things faster.

rbuysse (Wed, 24 Jul 2019 16:32:41 GMT):
@dan I added some comments

Dan (Wed, 24 Jul 2019 16:53:13 GMT):
@danintel looks like a couple minor things to update and then I can motion for FCP.

jsmitchell (Wed, 24 Jul 2019 16:57:16 GMT):
@arsulegai @Dan I think there might be an issue with poet-engine and timely evaluation of candidate blocks on a network where one node is trying to catch up

jsmitchell (Wed, 24 Jul 2019 16:58:16 GMT):
it's something you should see if you can track down

jsmitchell (Wed, 24 Jul 2019 17:00:33 GMT):
looking at the poet engine logs for a node that is about 10 blocks back, it seems to regularly attempt to build a competitive block and then win on lower wait timers vs the 'legitimate' blocks. Eventually aggregate local mean flips it back the other way. What I would expect to see is rapid evaluation of the sequence of legitimate blocks which would abort local publishing.

danintel (Wed, 24 Jul 2019 17:23:05 GMT):
@Dan OK. New suggestions keep on coming in

arsulegai (Wed, 24 Jul 2019 17:24:33 GMT):
@jsmitchell Yes, I remember such sequence of events happening. Is the node catch-up causing block validation slow, and others are not publishing because they're either catching up or end up resolving forks?

jsmitchell (Wed, 24 Jul 2019 17:34:50 GMT):
The others are all in consensus and continue to publish blocks

jsmitchell (Wed, 24 Jul 2019 17:36:12 GMT):
We think the 'validate before notify' requirement of poet are probably resulting in a cpu bottleneck in validation which is resulting in late notification to the engine

jsmitchell (Wed, 24 Jul 2019 17:36:29 GMT):
and of course, the engine decides it should keep publishing, which exacerbates the issue

Dan (Wed, 24 Jul 2019 17:48:44 GMT):
@danintel yes, one of the tricks for us to work out with the RFC process is to compress the feedback cycle. I think the last time I checked I had something like 40 PRs to review (not sure how many of those were RFCs), so it can be tough for everyone to prioritize the same issue at the same time. It's probably a good practice for the RFC champion to pick the two or three most relevant reviewers and try to get their review early on. That way the most substantial changes are up front and the rest of the reviews are more stable. In this case I should have thought to ping the folks who have been most involved in the docker files earlier on in the process.

arsulegai (Wed, 24 Jul 2019 18:06:21 GMT):
PoET 2 doesn't have such constraint as far as I know, what's that in PoET 1 which expects 'validate before notify'?

jsmitchell (Wed, 24 Jul 2019 19:31:50 GMT):
Consensus depends on state

jsmitchell (Wed, 24 Jul 2019 19:31:59 GMT):
Settings, registry, etc.

jsmitchell (Wed, 24 Jul 2019 19:38:06 GMT):
I'm pretty sure poet2 would be in the same situation

danintel (Wed, 24 Jul 2019 22:07:36 GMT):
@Dan The original review was on Google Docs, which worked well. The follow-on comments continued with the PR, which I did not expect. They were good comments, but I wish they were submitted early on in the process. Maybe if I guessed who I could ping, that would have worked better.

arsulegai (Thu, 25 Jul 2019 00:23:47 GMT):
@jsmitchell that makes it register if need to block publish. Is that what you meant by validate before notify? Unless the block is committed, the state values wouldn't be updated right?

arsulegai (Thu, 25 Jul 2019 00:23:47 GMT):
@jsmitchell I still failed to understand why a block needs to be validated before notify. It can still be done when consensus engine tells that block is valid. Unless the block is committed, the state values wouldn't be updated right? So consensus cannot depend on values set in current block.

jsmitchell (Thu, 25 Jul 2019 00:42:39 GMT):
At a minimum, the prior block needs to be evaluated and the new state root set so that the new block can be evaluated correctly. For example, a transaction in the prior block may register the signer of the new block in the validator registry, which allows the current block’s consensus payload to be evaluated correctly.

arsulegai (Thu, 25 Jul 2019 01:49:07 GMT):
Correct that will help in speeding up block validation without waiting for consensus engine. At consensus engine end the registration information is considered only after block commit (in PoET when VR TP writes the global state). Early validation helps in utilising free time at validator end. For example, if there are forks at same height then consensus engine becomes bottleneck because it'll serialise the way blocks are validated. But there's no strict mandate for early validation from consensus engine still.

anandakumar.n (Thu, 25 Jul 2019 06:41:18 GMT):
hello all! i would like know whether any remote attestation happening in SGX POET consensus or not?!

anandakumar.n (Thu, 25 Jul 2019 06:42:35 GMT):
remote attestation

Dan (Thu, 25 Jul 2019 15:17:55 GMT):
Yes. Poet has 2 modes. 1 is simulated and makes no TEE calls. The other is an SGX implementation. You can see some of the sgx calls here: https://github.com/hyperledger/sawtooth-poet/blob/master/sgx/sawtooth_poet_sgx/libpoet_enclave/poet_enclave.cpp

Dan (Thu, 25 Jul 2019 15:18:28 GMT):
and the calls to the attestation service here: https://github.com/hyperledger/sawtooth-poet/tree/master/ias_client

arsulegai (Thu, 25 Jul 2019 16:05:21 GMT):
Can we have a nightly docker image pushed for 1-2 branch until it's released?

amundson (Thu, 25 Jul 2019 21:49:29 GMT):
@arsulegai we could tag a new component release - which docker image?

arsulegai (Fri, 26 Jul 2019 00:19:20 GMT):
The validator

rbuysse (Fri, 26 Jul 2019 14:31:23 GMT):
there are chime tagged docker images for all components right now.

rbuysse (Fri, 26 Jul 2019 14:31:34 GMT):
I'll create a story for doing chime nightlies.

arsulegai (Fri, 26 Jul 2019 14:49:44 GMT):
Thanks

SethiSaab (Mon, 29 Jul 2019 14:07:57 GMT):
Has joined the channel.

SethiSaab (Mon, 29 Jul 2019 14:11:48 GMT):
Hi Team , i am currently working on a Transaction Addressing scheme. As per my understanding we need to addressing and namespace technique for get and set data. Now say I have 10 attributes and i want to craete a query which give me result of same transaction ,doesnt matter which parameter i use. How Should i Do that ? And how will this work in case i have an attribute of array type

pschwarz (Mon, 29 Jul 2019 15:18:04 GMT):
Hi SethiSaab - your question is more targeted towards the general #sawtooth channel, which is a user channel. This channel is for platform development.

jamesbarry (Wed, 31 Jul 2019 16:24:10 GMT):
Has joined the channel.

ArpanNag (Mon, 05 Aug 2019 14:16:49 GMT):
Has joined the channel.

arsulegai (Wed, 07 Aug 2019 16:51:14 GMT):
@jsmitchell Could you please update status of PoET tests done from your end?

arsulegai (Wed, 07 Aug 2019 16:51:14 GMT):
@jsmitchell Could you please update the status of PoET tests done from your end?

arsulegai (Thu, 08 Aug 2019 04:53:10 GMT):
I would like to join the debug/analysis on this issue pointed out here https://chat.hyperledger.org/channel/sawtooth-governance?msg=H9wZ9eRj9QziZWu9w

arsulegai (Thu, 08 Aug 2019 04:56:59 GMT):
In our runs, PoET-SGX did not fork unlike PoET-SIM for the same settings. PoET-SIM too didn't fork much with changed config values. So, I am currently looking into generated wait time values and the functionality around it. This to make sure block creation is not blocked because of consensus engine.

amundson (Thu, 08 Aug 2019 14:30:04 GMT):
initially, @rberg2 is running an additional set of LR tests to try and determine if it is a regression or not

amundson (Thu, 08 Aug 2019 14:32:43 GMT):
one working theory suggests that the bug is not new, but the CPU pressure points have changed and we are seeing an existing issue manifest itself in a way that it did not previously; but, we are running the tests to first compare with 1.1 behavior with the same settings and environment

amundson (Thu, 08 Aug 2019 14:33:32 GMT):
in particular, that the extra work the validator does during fork resolution (working on invalid forks) is at the heart of the problem

amundson (Thu, 08 Aug 2019 14:34:05 GMT):
since PBFT doesn't use forking, we don't see the issue there at all

amundson (Thu, 08 Aug 2019 14:35:50 GMT):
to fix that, we would have to make PoET a bit smarter so that it is nearly always working on the valid fork

amundson (Thu, 08 Aug 2019 14:39:01 GMT):
the core issue there is that, during fork resolution, you need to have calculated the PoET settings for the previous block. so if we optimize for the fact that those settings rarely change, that should have a substantial impact on being able to select the right fork without always running through all the transactions for every block prior to fork resolution

amundson (Thu, 08 Aug 2019 14:39:59 GMT):
that is one of many ideas

amundson (Thu, 08 Aug 2019 14:40:50 GMT):
first we need to try and determine if we can cause the issue to reliably occur, so we can iterate on a fix

amundson (Thu, 08 Aug 2019 14:41:56 GMT):
@arsulegai do you actually have PoET-SGX working with 1.2?

arsulegai (Thu, 08 Aug 2019 15:27:39 GMT):
Yes, there's LR1 pass report. That was when we were trying with different config options for PoET-SIM before the duplicate block schedule fix in the validator.

arsulegai (Sun, 11 Aug 2019 14:01:51 GMT):
The time spent by the Validator in validating blocks which could eventually end up in a fork that won't grow, can be reduced

arsulegai (Sun, 11 Aug 2019 14:06:10 GMT):
Here's an idea - Currently the Validator executes/validates the block before sending to the consensus engine. 1. The idea is that the Validator would send the block to the consensus engine as it receives. Do simultaneously send for processing if there's thread available. 2. Consensus engine (PoET for example) does validate the consensus field in the block 3. PoET then also applies a partial fork resolution (at least to rule out cases where a block has no chance of getting committed). Other consensus engines can do their respective validations as well.

arsulegai (Sun, 11 Aug 2019 14:06:10 GMT):
Here's an idea - Currently the Validator executes/validates the block before sending to the consensus engine. 1. The idea is that the Validator would send the block to the consensus engine as it receives. Do simultaneously send for processing if there's thread available. 2. Consensus engine (PoET for example) does validate the consensus field in the block 3. PoET then also applies a partial fork resolution (at least to rule out cases where a block has no chance of getting committed). Other consensus engines can do their respective validations as well. 4. The Validator then can decide to either wait for the block to complete processing or remove it from the scheduler based on the response from consensus engine. 5. If the block is valid, it's told to the consensus engine. 6. Consensus engine then can proceed whether to commit or ignore the block. In case of PoET the actual fork resolution or remaining part of the fork resolution.

arsulegai (Sun, 11 Aug 2019 14:08:56 GMT):
This may improve the performance, but has design change in both the Validator and the PoET

arsulegai (Sun, 11 Aug 2019 14:10:45 GMT):
We can think of this post 1-2 release if not now

jsmitchell (Mon, 12 Aug 2019 15:17:45 GMT):
we are thinking roughly along those lines @arsulegai

jsmitchell (Mon, 12 Aug 2019 15:18:29 GMT):
poet consensus validation depends on state (settings and validator registry), so we can't blindly process payloads

jsmitchell (Mon, 12 Aug 2019 15:21:33 GMT):
an optimization would be determining whether a given block contains a relevant consensus state-impacting transaction. If a chain of blocks for evaluation doesn't contain those, then the consensus payloads can be quickly evaluated with block validation occurring after the fact as a required async step. If a block didn't validate, that would require the consensus engine to pick a next best fork for evaluation. This should result in minimum effort for fork resolution.

jsmitchell (Mon, 12 Aug 2019 15:23:38 GMT):
But that is a future step. First, we are going to change the block validation state model to notify consensus before the block is validated (as the design intends), which will give the consensus engine additional knowledge regarding the work outstanding. This will allow it _not to publish_ competitive blocks.

arsulegai (Mon, 12 Aug 2019 15:30:01 GMT):
That sounds ok to me, to start with I will consider the possible optimization in PoET2. If the testing goes well, will discuss about backporting it to current PoET.

Heena078 (Mon, 19 Aug 2019 12:18:09 GMT):
Has joined the channel.

duncanjw (Tue, 20 Aug 2019 11:00:47 GMT):
Hi. In other news we have open sourced our DAML on Sawtooth implementation - https://github.com/blockchaintp/daml-on-sawtooth

duncanjw (Tue, 20 Aug 2019 11:01:55 GMT):
It’s early days and we still have to formally create an RFC and see if the sawtooth community is interested in us contributing this to the upstream project

duncanjw (Tue, 20 Aug 2019 11:04:48 GMT):
Please direct technical questions to @kodonnel

arsulegai (Tue, 20 Aug 2019 12:42:16 GMT):
@LeonardoCarvalho ^

LeonardoCarvalho (Wed, 21 Aug 2019 11:52:33 GMT):
Yeah, I'm restarting to develop my SDK flavor. :)

wkatsak (Thu, 22 Aug 2019 14:59:47 GMT):
Good morning. I've noticed an odd little race that I can reproduce in the test_config_smoke test

wkatsak (Thu, 22 Aug 2019 15:00:42 GMT):
Im debugging a patch that adds approximately 500ms to the validator's socket setup() function (a hostname query to determine if a host uses ipv6)

wkatsak (Thu, 22 Aug 2019 15:00:59 GMT):
when this patch is in place, the test_config_smoke will hang

wkatsak (Thu, 22 Aug 2019 15:01:09 GMT):
if i remove the line in question ,it passes

wkatsak (Thu, 22 Aug 2019 15:01:18 GMT):
if i add a time.sleep(0.4), it also hangs

wkatsak (Thu, 22 Aug 2019 15:01:22 GMT):
O'

wkatsak (Thu, 22 Aug 2019 15:02:06 GMT):
I'm thinking this might be a docker-compose race, but im noticing that none of the devmode entries in the tests have any dependency listed

wkatsak (Thu, 22 Aug 2019 15:02:16 GMT):
has anyone noticed this or thought about this?

wkatsak (Thu, 22 Aug 2019 15:08:40 GMT):
If i add a sleep 2.0 to the devmode command line, it works fine.

wkatsak (Thu, 22 Aug 2019 15:09:46 GMT):
Obviously, i can change the way my patch works to not slow down, but this seems like something that should be checked. It seems like if the consensus comes online too soon (before the validator is ready) something funky happens.

pschwarz (Thu, 22 Aug 2019 15:32:12 GMT):
Are you using the published devmode or are you using devmode nightly?

wkatsak (Thu, 22 Aug 2019 15:35:31 GMT):
nightly

wkatsak (Thu, 22 Aug 2019 15:35:39 GMT):
im running the smoketest on master

wkatsak (Thu, 22 Aug 2019 15:38:30 GMT):
You can see this if you look at my pull request for IPv6 (https://github.com/hyperledger/sawtooth-core/pull/2093), the current version was just rebased to master.

wkatsak (Thu, 22 Aug 2019 15:39:17 GMT):
run `bin/run_docker_test test_config_smoke`

wkatsak (Thu, 22 Aug 2019 15:40:43 GMT):
The odd thing is that I do see the validator output the line about registering the devmode

wkatsak (Thu, 22 Aug 2019 15:40:55 GMT):
but it still freezes

wkatsak (Thu, 22 Aug 2019 15:41:04 GMT):
if i go into the devmode container and start an instance manually, the test finishes

wkatsak (Thu, 22 Aug 2019 15:41:17 GMT):
and if i add that sleep mentioned to compose, it also works fine

amundson (Thu, 22 Aug 2019 16:23:38 GMT):
@wkatsak yeah, sounds like a bug. the desired behavior is that startup order of the processes doesn't matter. (except, tests run after everything is ready for the test)

pschwarz (Thu, 22 Aug 2019 16:25:14 GMT):
That's why I asked about versions - there was a fix in devmode master/nightly that should fix that issue

amundson (Thu, 22 Aug 2019 16:26:48 GMT):
recently?

pschwarz (Thu, 22 Aug 2019 16:26:51 GMT):
Looks like it's using `hyperledger/sawtooth-devmode-engine-rust:nightly`

pschwarz (Thu, 22 Aug 2019 16:26:56 GMT):
So, yes, it is a bug

pschwarz (Thu, 22 Aug 2019 16:27:16 GMT):
(Unless a nightly for devmode hasn't been pushed out recently)

amundson (Thu, 22 Aug 2019 16:27:47 GMT):
or @wkatsak has older images cached locally

wkatsak (Thu, 22 Aug 2019 17:08:49 GMT):
Hmm. That could be. I’ll nuke my images and try again

wkatsak (Fri, 23 Aug 2019 13:49:03 GMT):
@amundson @pschwarz This issue still appears with latest devmode nightly

wkatsak (Fri, 23 Aug 2019 13:49:27 GMT):
I removed the reason my patch triggerd it, but the underlying issue is still there.

wkatsak (Fri, 23 Aug 2019 13:49:32 GMT):
*triggered

rbuysse (Fri, 23 Aug 2019 14:35:39 GMT):
@pschwarz the hyperledger/sawtooth-devmode-engine-rust:nightly image is 1.2.3-dev13 which is the latest build

mfford (Sat, 24 Aug 2019 21:28:17 GMT):
REMINDER: The Hyperledger Sawtooth Contributor Meeting will be on Monday, August 26th at 10am CDT. The meeting information can be found on the Hyperledger Community Meetings Calendar located here: https://wiki.hyperledger.org/display/HYP/Calendar+of+Public+Meetings Here is the direct zoom link: https://zoom.us/j/438462056 There is still time to add items to the agenda for this meeting. If you have an appropriate topic you would like to discuss and facilitate, please add it to the agenda, located in the wiki here: https://wiki.hyperledger.org/pages/viewpage.action?pageId=16325305 Looking forward to seeing everyone there! -Mark

JonGeater (Mon, 26 Aug 2019 15:26:43 GMT):
Thanks for the update on root causing the 1.2 regression. Sorry I lost signal just as you answered and wasn’t able to acknowledge but it was very encouraging to hear things are going better

jsmitchell (Mon, 26 Aug 2019 16:10:31 GMT):
It was fun to figure out

jsmitchell (Mon, 26 Aug 2019 16:10:41 GMT):
"fun"

JonGeater (Mon, 26 Aug 2019 16:11:01 GMT):
Is there a spec for the test net nodes (in the form of a helm chart or docker file or something)? I may be able to add a node and some useful workload

JonGeater (Mon, 26 Aug 2019 16:11:42 GMT):
I really agree with what was said about the depth of testing needed to get something from 'working' to 'good enough' to 'ready'

JonGeater (Mon, 26 Aug 2019 16:12:20 GMT):
That's what I want to happen

jsmitchell (Mon, 26 Aug 2019 16:13:04 GMT):
@rberg2 @rbuysse ^

rberg2 (Mon, 26 Aug 2019 16:43:11 GMT):
The test net works are run on AWS nodes using the deb packages installed directly, no docker or helm involved.

JonGeater (Mon, 26 Aug 2019 16:55:02 GMT):
Thanks @rberg2. Is there a script then, or some recipe to ensure the right configuration and

JonGeater (Mon, 26 Aug 2019 16:55:02 GMT):
Thanks @rberg2. Is there a script then, or some recipe to ensure the right configuration and experiments are run in all places? Sorry for the very elementary questions, just seeking whether it's feasible for me and my team to help here

rberg2 (Mon, 26 Aug 2019 17:50:47 GMT):
We have some ansible plays that setup these networks, I will look into sharing those.

JonGeater (Mon, 26 Aug 2019 22:24:52 GMT):
Ah great, thanks. That would work

sanket1211 (Wed, 04 Sep 2019 13:35:07 GMT):
Has joined the channel.

LeonardoCarvalho (Wed, 25 Sep 2019 11:42:44 GMT):
hey guys, do we got a mock validator in any language?

arsulegai (Wed, 25 Sep 2019 13:49:48 GMT):
Is it for the unit test cases?

LeonardoCarvalho (Wed, 25 Sep 2019 20:53:18 GMT):
yup

MHBauer (Sat, 05 Oct 2019 02:08:25 GMT):
Has joined the channel.

jsmitchell (Wed, 09 Oct 2019 17:40:55 GMT):
https://docs.google.com/document/d/12ce5XjmNdMF647mk2IyWdyz1mYPEA7MtCrfpKQw0t3c/edit#heading=h.2y5gwh60nerk

jsmitchell (Wed, 09 Oct 2019 17:41:21 GMT):
working doc for options of aries DID/VCs as an identity source for sawtooth/transact ^

amundson (Wed, 09 Oct 2019 17:42:39 GMT):
@jsmitchell drinking Nathan's koolaide?

amundson (Wed, 09 Oct 2019 17:42:39 GMT):
@jsmitchell drinking the koolaide?

jsmitchell (Wed, 09 Oct 2019 17:42:54 GMT):
heh

amundson (Wed, 09 Oct 2019 17:43:21 GMT):
I think that's very cool

LeonardoCarvalho (Tue, 15 Oct 2019 10:38:13 GMT):
hello all

LeonardoCarvalho (Tue, 15 Oct 2019 10:38:56 GMT):
I am dealing with a embarrassingly simple problem, and could use some help

LeonardoCarvalho (Tue, 15 Oct 2019 10:39:30 GMT):
my Java TP is sending the setState messages ok, I get the OK from the validator

LeonardoCarvalho (Tue, 15 Oct 2019 10:40:19 GMT):
but the rust validator, in parallel mode, crashes hard after sending the first sequence of transations

LeonardoCarvalho (Tue, 15 Oct 2019 10:40:44 GMT):
the pattern is, I send any number of INT TP transactions

LeonardoCarvalho (Tue, 15 Oct 2019 10:40:51 GMT):
the got accepted

LeonardoCarvalho (Tue, 15 Oct 2019 10:41:15 GMT):
I get `DEBUG scheduler_parallel] Removed N incomplete batches from the schedule`

LeonardoCarvalho (Tue, 15 Oct 2019 10:41:31 GMT):
and after that, a rust stack trace about timeout

LeonardoCarvalho (Tue, 15 Oct 2019 10:42:09 GMT):
The image is sawtooth-devmode-engine-rust:1.2

LeonardoCarvalho (Tue, 15 Oct 2019 10:42:15 GMT):
others are at 1.3 level

LeonardoCarvalho (Tue, 15 Oct 2019 10:42:24 GMT):
any ideas on what can be missing ?

arsulegai (Tue, 15 Oct 2019 12:28:28 GMT):
Is the issue happening only in parallel scheduling mode? If all the transactions are getting removed then there's probably mismatch in TP's name/version from what is sent by the client. @agunde has a pending PR to fix the TP timeout error. You could be facing the same issue.

LeonardoCarvalho (Tue, 15 Oct 2019 21:25:25 GMT):
I doubt, the python code works well, I think I simply forgot to send something back after the set operation...

LeonardoCarvalho (Tue, 15 Oct 2019 21:25:39 GMT):
But I will take a look at the timeout ticket, thanks!

LeonardoCarvalho (Wed, 16 Oct 2019 10:18:51 GMT):
nothing like a good sleep night. I was swallowing TP_PROCESS_RESPONSES on my flows. Duh.

LeonardoCarvalho (Wed, 16 Oct 2019 10:50:55 GMT):
well, they are sent back, but no dice yet. Even in serial mode. I must be messing another part of the messages flow.

danintel (Wed, 16 Oct 2019 23:13:48 GMT):
Please update the `Sawtooth 1.1 has been released` banner on #sawtooth

pschwarz (Thu, 17 Oct 2019 14:39:20 GMT):
It's removed - you have to refresh for it to take effect

amundson (Thu, 17 Oct 2019 15:32:14 GMT):
.

amundson (Thu, 17 Oct 2019 15:32:20 GMT):

amundson (Thu, 17 Oct 2019 15:32:31 GMT):
rocketchat is super buggy, maybe that will help?

arsulegai (Tue, 22 Oct 2019 10:37:27 GMT):
Is there a plan to update hyperledger/blockchain-explorer to include HL Sawtooth?

amundson (Tue, 22 Oct 2019 17:18:13 GMT):
not that I'm aware of. if someone does start working on more explorer work, we could dig up some ui mockups of some ideas.

arsulegai (Tue, 22 Oct 2019 17:44:32 GMT):
This could've been a good Hyperledger internship project... I see support for other projects are being added by interns. Please consider this proposal for the next internship program.

amundson (Tue, 22 Oct 2019 17:58:14 GMT):
@arsulegai take the lead on it, you would be a good mentor

pschwarz (Tue, 22 Oct 2019 18:36:13 GMT):
sure

saanvijay (Thu, 24 Oct 2019 10:14:39 GMT):
Has joined the channel.

tuckerg (Tue, 29 Oct 2019 08:37:38 GMT):
Has joined the channel.

Alwii (Wed, 30 Oct 2019 07:36:22 GMT):
Has joined the channel.

amundson (Fri, 08 Nov 2019 15:52:19 GMT):
I suggest that sawtooth 2.0 be written as a splinter service (splinter providing networking, circuits, etc.). I'll eventually do an RFC for this but wanted to start the discussion prior. For those not familiar with splinter, it is here - https://github.com/cargill/splinter -- it uses transact and sabre in its scabbard component which demonstrates kind of were sawtooth would fit (as a peer of scabbard).

amundson (Fri, 08 Nov 2019 15:52:19 GMT):
I suggest that sawtooth 2.0 be written as a splinter service (splinter providing networking, circuits, etc.). I'll eventually do an RFC for this but wanted to start the discussion prior. For those not familiar with splinter, it is here - https://github.com/cargill/splinter -- it uses transact and sabre in its scabbard component which demonstrates kind of where sawtooth would fit (as a peer of scabbard).

arsulegai (Sat, 09 Nov 2019 08:12:38 GMT):
Splinter is now outside Hyperledger, would that be an issue?

LeonardoCarvalho (Sun, 10 Nov 2019 15:11:26 GMT):
wow, that's look extremely interesting!

alexhq (Tue, 12 Nov 2019 11:31:51 GMT):
Has joined the channel.

amundson (Thu, 14 Nov 2019 22:03:55 GMT):
@arsulegai I don't think so, just another dependency. we can use libsplinter to construct the validator to run it separately too.

MarcoPasotti (Tue, 19 Nov 2019 09:37:04 GMT):
Has joined the channel.

arsulegai (Mon, 25 Nov 2019 14:01:56 GMT):
Do we have a call today?

mfford (Mon, 25 Nov 2019 14:28:04 GMT):
We do not. That meeting is cancelled.

arsulegai (Mon, 25 Nov 2019 15:58:37 GMT):
I see, thanks

hidura (Thu, 28 Nov 2019 01:39:37 GMT):
Has joined the channel.

DaveBuck (Tue, 03 Dec 2019 18:33:13 GMT):
Has joined the channel.

jamesbarry (Mon, 09 Dec 2019 18:37:38 GMT):
@amundson I am just catching this idea of Sawtooth 2.0 being a splinter service. Would that preclude Sawtooth 2.0 from being a standalone chain or simply a means to integrate with Spinter itself? I see uses for Sawtooth beyond shaing between entities (companies) If Sawtooth is a peer to Scabbard, would it retain its own admin and API or subrogate it to Splinter?

amundson (Mon, 09 Dec 2019 19:41:39 GMT):
@jamesbarry splinter would become a core piece of what is necessary to run a sawtooth validator. but, that doesn't preclude compiling libsplinter into a sawtooth-validator and running it that way. it would be more of a customized splinter daemon than anything though.

amundson (Mon, 09 Dec 2019 19:42:24 GMT):
sawtooth would still have its own API (scabbard also has its own API that isn't part of the core splinter daemon, it just happens too ship with splinter)

amundson (Mon, 09 Dec 2019 19:44:00 GMT):
in terms of administration, I think that might depend on how we run it. if we are running a sawtooth-validator, that can be very sawtooth-specific in terms of configuration and administration; if we are running as a sawtooth service in a generic splinter daemon, then things have to be more run-time.

amundson (Mon, 09 Dec 2019 19:45:16 GMT):
in general, I think we need to move more runtime-level anyway, and less static-config. in splinter, for example, we don't determine peers until we create circuits. so it doesn't make sense to configure peers on the command line of the daemon itself, because that's part of administering the node a runtime.

jamesbarry (Mon, 09 Dec 2019 20:05:16 GMT):
@amundson Thanks for the answer. I am assuming Splinter will not become part of Hyperledger? I will think through the runtime vs. static admin and post some questions back. Do you need a separate #sawtooth-slinter discussion area? I think that the 2.0 decision once understood will generate some levels of discussion.

jamesbarry (Mon, 09 Dec 2019 20:05:34 GMT):
@amundson We currently depend on validated static admin for our first government customer build. Runtime level would not work for the specific needs they have, and being the first cutomer they ditact the direction we are moving....

amundson (Mon, 09 Dec 2019 20:10:25 GMT):
this channel should be light enough traffic to handle the discussion. re:static vs. dynamic - this would be more a concern about how we design the future sawtooth-validator daemon than splinter per-se, though there is a circuity-creation step we will need to handle. (this is not necessarily difficult, but it doesn't exist in sawtooth today)

amundson (Mon, 09 Dec 2019 21:39:04 GMT):
as far as Splinter becoming a HL project, maybe -- not sure whether it would be welcome or not, and it takes a substantial amount of energy to propose a HL project either way

amundson (Fri, 03 Jan 2020 17:47:55 GMT):
for those interested in the future direction of transaction processors / transaction handlers and generally smart contract APIs, there is interesting work we are doing in Transact that we anticipate being the path forward for Sawtooth as well. one such aspect is separating out the idea of 'smart contract' from 'transaction handler' (and potentially, I think renaming transaction handler to smart contract engine'. also the simplified smart contract stuff going in should make it easier to write smart contracts (providing code to do addressing for example, and trickling that into the definition of smart contract).

MatthewRubino (Tue, 21 Jan 2020 13:53:32 GMT):
Has joined the channel.

MatthewRubino (Tue, 21 Jan 2020 15:04:48 GMT):
would someone be able to tell me what might cause this error to happen? https://github.com/hyperledger/sawtooth-core/blob/master/validator/src/journal/block_validator.rs#L703-L709

amundson (Tue, 21 Jan 2020 19:22:32 GMT):
@MatthewRubino what is the full error string?

MatthewRubino (Tue, 21 Jan 2020 19:36:34 GMT):
@amundson something like this ```WARNING | Dummy-13:(unknown file) | [src/journal/block_validator.rs: 284] Error during block validation: BlockValidationError("During validate_on_chain_rules, error creating settings view: NotFound(\"63add5c25ce6b279fb4c91aa9d63e6929474863ad4fbf7828412461c16590ecf\")")``` where the node in question is trying to sync and the state root hash is some number of blocks AFTER the current block head

MatthewRubino (Tue, 21 Jan 2020 19:36:34 GMT):
@amundson something like this ```WARNING | Dummy-13:(unknown file) | [src/journal/block_validator.rs: 284] Error during block validation: BlockValidationError("During validate_on_chain_rules, error creating settings view: NotFound(\"63add5c25ce6b279fb4c91aa9d63e6929474863ad4fbf7828412461c16590ecf\")")``` where the node in question is trying to sync (~1000 blocks behind; ~60k total blocks) and the state root hash is some number of blocks AFTER the current block head (<10)

amundson (Tue, 21 Jan 2020 23:01:29 GMT):
@MatthewRubino "settings view" internally returns settings for a specific state root hash; so when you create it, it takes that state root hash. the error "NotFound" indicates the underlying database (lmdb) couldn't find an entry for that state root hash. usually this would get created and stored as blocks are processed, so the error is a bit strange. is it possible that there were previously io-level errors were the lmdb database is stored (like disk full maybe)?

MatthewRubino (Wed, 22 Jan 2020 14:11:11 GMT):
the disk is definitely not full. in the past we did reach some IOPS limits but i thought we addressed those. there was not any atypical load on the network anyways. would the remedy be to delete the data dir? or perhaps just certain lmdb files and have it sync from scratch?

MatthewRubino (Wed, 22 Jan 2020 14:11:11 GMT):
the disk is definitely not full. in the past we did reach some IOPS limits but i thought we addressed those. there was not any atypical load on the network anyways. would the remedy be to delete the data dir? or perhaps just certain lmdb files and have it sync from scratch? or is there a potential race condition bug we may have come across?

MatthewRubino (Wed, 22 Jan 2020 15:50:17 GMT):
@amundson so i moved the data dir and restarted the pod. it starts to sync but then runs into the same issue with a handful of different state hashes which it then never seems to be able to get past

dock (Wed, 22 Jan 2020 16:07:47 GMT):
Has joined the channel.

dock (Wed, 22 Jan 2020 16:07:48 GMT):
@amundson for a little more context, the state root hash in question (that the broken node can't find) can be found on it's peer nodes 2 blocks beyond it's own furthest block.

amundson (Wed, 22 Jan 2020 20:26:56 GMT):
what version of Sawtooth are you running? does it have any customizations?

amundson (Wed, 22 Jan 2020 20:29:40 GMT):
I'm wondering if this could be caused by non-deterministic TPs and the error we are seeing is a symptom but not the root cause

amundson (Wed, 22 Jan 2020 20:30:45 GMT):
More generically, is there anything obvious about the TPs behavior that would be different than an Xo and Intkey

IWontDiscloseMyIdentity (Thu, 23 Jan 2020 06:09:07 GMT):
Has joined the channel.

MatthewRubino (Thu, 23 Jan 2020 14:13:05 GMT):
we are on 1.2 and yes we have two TPs of our own. i am fairly certain they are deterministic but we can do an audit. curious, if there was an issue like that wouldn't the node have a different state hash or be looking for a different one than given? trying to see if we could narrow the down to try and hone in on where things might have gone awry

MatthewRubino (Thu, 23 Jan 2020 14:13:05 GMT):
we are on 1.2 and yes we have two TPs of our own. i am fairly certain they are deterministic but we can do an audit. curious, if there was an issue like that wouldn't the node have a different state hash or be looking for a different one than the other chains have? trying to see if we could narrow the down to try and hone in on where things might have gone awry

amundson (Thu, 23 Jan 2020 15:35:02 GMT):
@MatthewRubino which dot release of 1.2?

amundson (Thu, 23 Jan 2020 15:36:36 GMT):
you are correct that if there is indeterminism, we would expect a state hash mismatch and for the block to be discarded -- just looking for differences between what you are doing and what we've done in our testing

MatthewRubino (Thu, 23 Jan 2020 15:51:02 GMT):
we dont specify a micro version so just the latest 1.2. the digest matches 1.2.3 (`03974b8bd0b9`)

amundson (Thu, 23 Jan 2020 15:56:24 GMT):
you aren't using the pre-compiled stuff then?

amundson (Thu, 23 Jan 2020 15:57:41 GMT):
what is your version of pbft?

amundson (Thu, 23 Jan 2020 15:57:49 GMT):
(are you using pbft?)

MatthewRubino (Thu, 23 Jan 2020 16:06:09 GMT):
we are using the docker image. and yes pbft. the set version is 1.0 and the digest matches 1.0.1 (`b49c0d01b827`)

MatthewRubino (Thu, 23 Jan 2020 16:06:09 GMT):
we are using the docker images. and yes pbft. the set version is 1.0 and the digest matches 1.0.1 (`b49c0d01b827`)

arsulegai (Thu, 23 Jan 2020 16:49:19 GMT):
Does that address exist in the other nodes?

arsulegai (Thu, 23 Jan 2020 16:49:19 GMT):
Rephrasing my question: Is it a new node added?

arsulegai (Thu, 23 Jan 2020 16:57:06 GMT):
Did the node in question parse the transaction that says it is part of the network now onwards? Or was it part of the network from beginning, just trying to catchup now with others?

amundson (Thu, 23 Jan 2020 17:02:21 GMT):
@MatthewRubino sounds like versions are the same then

MatthewRubino (Thu, 23 Jan 2020 17:28:53 GMT):
we have tried both. the node was part of a network (1 of 12). It randomly (possibly after a reboot) flatlined due to being unable to progress past that point. subsequent restarts didn't change anything. we have tried moving the data directory out to essentially start it as new node (with the same key pair) and it is unable to sync anything. it gets the same error though it cycles over ~3-4 different state hashes.

MatthewRubino (Thu, 23 Jan 2020 17:35:22 GMT):
we have ~60k blocks that have been running over 4-5 months. there havent been any code changes to sawtooth or our TPs in ~3months (maybe)

MatthewRubino (Thu, 23 Jan 2020 17:35:22 GMT):
we have ~60k blocks that have been running over 4-5 months. there havent been any code changes to sawtooth or our TPs in \~3months (maybe)

MatthewRubino (Thu, 23 Jan 2020 17:35:22 GMT):
we have ~60k blocks that have been running over 4-5 months. there haven't been any code changes to sawtooth or our TPs in 3months (maybe)

MatthewRubino (Fri, 24 Jan 2020 14:34:12 GMT):
this appears to have happened to a second node now. it is stuck 7 blocks behind the group. with the same sort of error. we have paused incoming batches. i am going to see if i can find anything useful.

amundson (Fri, 24 Jan 2020 16:49:36 GMT):
@MatthewRubino if you come up with some way we could replicate it locally, please share. otherwise, sharing logs or whatever may be helpful.

jsmitchell (Fri, 24 Jan 2020 16:59:30 GMT):
That sounds like some kind of sequencing issue. Something is making an assumption about the presence of an uncommitted state root. We'd need to see the surrounding logs from both the consensus engine and the validator.

MatthewRubino (Fri, 24 Jan 2020 17:30:29 GMT):
working on that. will have to get back to you

MatthewRubino (Fri, 24 Jan 2020 18:22:54 GMT):
is int-key and block-info safe in this sense? we use block-info injection and have int-key that inc/dec once every 15 minutes to act as a sort of ping. we then have two of our own TPs which, so far as I can tell, dont have any issues in terms of determinism

MatthewRubino (Fri, 24 Jan 2020 18:22:54 GMT):
is int-key and block-info safe in this sense? we use block-info injection and have an int-key that inc/dec once every 15 minutes to act as a sort of ping. we then have two of our own TPs which, so far as I can tell, dont have any issues in terms of determinism

MatthewRubino (Fri, 24 Jan 2020 19:41:51 GMT):
i did find this in the logs for the two nodes that are now failing: ```block-info-tp | WARN | block_info_tp::handl | Invalid Transaction: Timestamp must be less than local time. Expected 1579704304 in (1579704664-300, 1579704664+300)```

MatthewRubino (Fri, 24 Jan 2020 19:41:51 GMT):
i did find this in the logs for the two nodes that are now failing (different times and such): ```block-info-tp | WARN | block_info_tp::handl | Invalid Transaction: Timestamp must be less than local time. Expected 1579704304 in (1579704664-300, 1579704664+300)```

MatthewRubino (Fri, 24 Jan 2020 19:44:37 GMT):
thats like 2:45 vs 2:51, which is a rather extreme time difference. there was a reboot in the logs a few minutes before. is there some timing bit where it hasnt committed the block but then restarts and thus ends up with a very different time for validation?

MatthewRubino (Fri, 24 Jan 2020 19:44:37 GMT):
thats like 2:45 vs 2:51+-300, which is a rather extreme time difference. there was a reboot in the logs a few minutes before. is there some timing bit where it hasnt committed the block but then restarts and thus ends up with a very different time for validation?

MatthewRubino (Fri, 24 Jan 2020 19:49:57 GMT):
here is the first node that failed which we moved the data dir to try and get it to sync from scratch

MatthewRubino (Fri, 24 Jan 2020 19:50:12 GMT):
```WARN | block_info_tp::handl | Invalid Transaction: Timestamp must be less than local time. Expected 1573080676 in (1579670414-300, 1579670414+300)` `1573080676` = November 6, 2019 10:51:16 PM `1579670414` = January 22, 2020 5:20:14 AM ```

MatthewRubino (Fri, 24 Jan 2020 19:50:12 GMT):
```WARN | block_info_tp::handl | Invalid Transaction: Timestamp must be less than local time. Expected 1573080676 in (1579670414-300, 1579670414+300)``` `1573080676` = November 6, 2019 10:51:16 PM `1579670414` = January 22, 2020 5:20:14 AM

MatthewRubino (Fri, 24 Jan 2020 19:51:05 GMT):
@amundson or @jsmitchell is block info expected to function in this manner? do we have something about it not setup correctly?

MatthewRubino (Fri, 24 Jan 2020 19:51:51 GMT):
or is that just a red herring?

MatthewRubino (Fri, 24 Jan 2020 19:58:18 GMT):
and we are on `hyperledger/sawtooth-block-info-tp:1.2.3`

MatthewRubino (Fri, 24 Jan 2020 19:58:18 GMT):
and we are on `hyperledger/sawtooth-block-info-tp:1.2.3` i dont recall us having any issues like this on POET or perhaps block info 1.1.x. maybe we missed some setting adjustment

MatthewRubino (Fri, 24 Jan 2020 19:58:18 GMT):
and we are on `hyperledger/sawtooth-block-info-tp:1.2.3` i dont recall us having any issues like this on POET or perhaps block info 1.1.x. we updated everything to 1.2.x to PBFT. maybe we missed some setting adjustment

MatthewRubino (Fri, 24 Jan 2020 20:03:11 GMT):
``` 'sawtooth.validator.batch_injectors': 'block_info', 'sawtooth.validator.block_validation_rules': 'NofX:1,block_info;XatY:block_info,0;local:0' }```

MatthewRubino (Fri, 24 Jan 2020 20:03:11 GMT):
``` 'sawtooth.validator.batch_injectors': 'block_info', 'sawtooth.validator.block_validation_rules': 'NofX:1,block_info;XatY:block_info,0;local:0'```

jsmitchell (Fri, 24 Jan 2020 20:14:24 GMT):
That _in range_ does not seem correct for historical timestamps

MatthewRubino (Fri, 24 Jan 2020 20:17:40 GMT):
right. so i havent tried to reproduce it locally, but if that is the issue it appears it would be easy to do

jsmitchell (Fri, 24 Jan 2020 20:19:51 GMT):
imo the check should be in range (prior block's block info timestamp, local clock+tolerance)

jsmitchell (Fri, 24 Jan 2020 20:21:47 GMT):
where "prior block's info timestamp" is just the current state value timestamp

amundson (Fri, 24 Jan 2020 20:33:05 GMT):
```fn validate_timestamp(timestamp: u64, tolerance: u64) -> Result<(), ApplyError> { let now = SystemTime::now() .duration_since(UNIX_EPOCH) .expect("System time is before Unix epoch.") .as_secs(); if timestamp < (now - tolerance) || (now + tolerance) < timestamp { let warning_string = format!( "Timestamp must be less than local time. Expected {0} in ({1}-{2}, {1}+{2})", timestamp, now, tolerance ); warn!("Invalid Transaction: {}", &warning_string); return Err(ApplyError::InvalidTransaction(warning_string)); } Ok(()) }```

amundson (Fri, 24 Jan 2020 20:33:42 GMT):
that's the logic in the rust version of batch_info (1.2.3 - not sure if that is rust, might be python, but probably same logic)

amundson (Fri, 24 Jan 2020 20:36:18 GMT):
using block_info requires all the nodes to keep accurate time (using NTP probably) and be accurate at least to tolerance. If you had a node with bad time and it rebooted, when it came up it might have synced time with ntpdate (or similar) and now have a more accurate time.

amundson (Fri, 24 Jan 2020 20:37:14 GMT):
though, based on your error, it looks like your local time went backwards

amundson (Fri, 24 Jan 2020 20:39:46 GMT):
no, that's not the case (misread it)

MatthewRubino (Fri, 24 Jan 2020 20:43:42 GMT):
so that one with the date from Nov is because it starting over from block 0 as opposed to keeping up. i think if they fall behind more than 5 minutes they cannot validate new blocks as they come in

MatthewRubino (Fri, 24 Jan 2020 20:44:39 GMT):
this definitely worked before we did PBFT upgrade, no one on the team recalls us trying to sync from 0 with PBFT

amundson (Fri, 24 Jan 2020 20:44:43 GMT):
that looks like a bug to me, should be "if (now + tolerance) < timestamp" to only check timestamp is less than the upper bound

amundson (Fri, 24 Jan 2020 20:45:00 GMT):
did you go to a different version of 1.2.x to 1.2.3 at the same time?

MatthewRubino (Fri, 24 Jan 2020 20:45:19 GMT):
before PBFT it was likely 1.1.x

amundson (Fri, 24 Jan 2020 20:45:45 GMT):
let me look at that impl

amundson (Fri, 24 Jan 2020 20:46:49 GMT):
```def validate_timestamp(timestamp, tolerance): now = time.time() if (timestamp - now) > tolerance: raise InvalidTransaction( "Timestamp must be less than local time." " Expected {0} in ({1}-{2}, {1}+{2})".format( timestamp, now, tolerance))```

amundson (Fri, 24 Jan 2020 20:47:21 GMT):
1.1.x was python, 1.2.x is rust. I think the bug existed in python at one time, got fixed there, probably got carried over to the rust impl before that.

amundson (Fri, 24 Jan 2020 20:49:43 GMT):
this was the python fix - ```commit 5e7315a9f0e3c8863327034c036b28e70850112c Author: Peter Schwarz Date: Tue Feb 6 15:31:12 2018 -0600 Correct timestamp check for catch up The timecheck needs to ensure that the timestamp is only ahead of a transaction processor's local time by the value of tolerance. The use of absolute value enforeced that this time check is within tolerence of the local time. This fails validation in the case where a node is catching up on a chain the may contain blocks that were published more than time-tolerence in the past. Correcting this to ensure the the timestamp is no more than tolerence in the future, ensures that the transaction can still be validated during a catch up scenario. Fixes STL-1048```

MatthewRubino (Fri, 24 Jan 2020 20:51:48 GMT):
is there a python-tp we can or should use instead? would swapping the TP fix it or is our chain a bust?

MatthewRubino (Fri, 24 Jan 2020 20:51:48 GMT):
is there a python-tp we can or should use instead? would swapping/fixing the TP fix it or is our chain a bust?

MatthewRubino (Fri, 24 Jan 2020 20:53:39 GMT):
and is it valuable for me to try and reproduce this locally at this point?

amundson (Fri, 24 Jan 2020 20:53:43 GMT):
probably best to fix up the rust one and then use that. since it will be less restrictive, should be fine.

amundson (Fri, 24 Jan 2020 20:54:59 GMT):
not valuable to reproduce this specific bug, no

MatthewRubino (Fri, 24 Jan 2020 20:57:54 GMT):
is that something I should make a PR for? and is it just to github.com?

amundson (Fri, 24 Jan 2020 21:05:08 GMT):
@MatthewRubino I put up a PR - https://github.com/hyperledger/sawtooth-core/pull/2280 - do you have the ability there to test it?

amundson (Fri, 24 Jan 2020 21:06:07 GMT):
that is against master, I will backport it to the 1-2 branch after it goes into master

MatthewRubino (Fri, 24 Jan 2020 21:16:16 GMT):
thanks. we might be able to get it into AWS and see that it fixes our test environment. would that get pushed to a nightly or something? that would make it much easier as opposed to an ECR repo and such

MatthewRubino (Fri, 24 Jan 2020 21:16:16 GMT):
thanks. we might be able to get it into AWS and see that it fixes our test environment. would that get pushed to a nightly or something? that would make it much easier as opposed to an ECR repo and such; though we could still figure something like that out. but probably not until monday. (we are all east coast)

kodonnel (Mon, 27 Jan 2020 16:02:46 GMT):
Is there a contributors call today? Still on the calendar, but the zoom id is invalid.

mfford (Mon, 27 Jan 2020 16:09:37 GMT):
That was previously cancelled in late 2019. I noticed it was added back to the calendar last week, and messaged for it to be removed

kodonnel (Mon, 27 Jan 2020 16:10:19 GMT):
Cancelled just for Jan?

kodonnel (Mon, 27 Jan 2020 16:10:19 GMT):
Cancelled just for Jan? And back in business Feb I assume?

amundson (Mon, 27 Jan 2020 16:14:09 GMT):
no, the idea was to have the conversations here instead of a meeting, and have meetings on specific topics if we identify them here.

kodonnel (Mon, 27 Jan 2020 16:17:25 GMT):
Then in case no one else has, I'd suggest that we revisit that decision after a while to see how well it is working.

amundson (Mon, 27 Jan 2020 16:27:47 GMT):
did you have a topic?

kodonnel (Mon, 27 Jan 2020 16:31:08 GMT):
Not this round. But it was a good way to touch base, and different people work and communicate differently. It seems worth the effort to check in on that decision after a few months to see if it is still working well.

amundson (Mon, 27 Jan 2020 16:42:35 GMT):
there was another splinter release - https://github.com/Cargill/splinter/blob/master/RELEASE_NOTES.md - of particular interest for Sawtooth is probably how we are working with experimental Rust features there - https://github.com/Cargill/splinter-docs/blob/master/docs/community/stable_feature_checklist.md

amundson (Mon, 27 Jan 2020 16:42:51 GMT):
we should probably do the same thing w/Sawtooth going forward

Dan (Mon, 27 Jan 2020 17:11:46 GMT):
FYI on common repo files across HL: https://wiki.hyperledger.org/x/QQR6AQ

jamesbarry (Wed, 29 Jan 2020 21:21:27 GMT):
I think the blocks never catching up should be a large issue to push this Jira STL-1510 up in priority and fixed. We are having the issue too. We are backing out our custom code so we can recreate the issue and demo it to others. It is definitely an issue when a node gets disconnected or out of synch, that node will not catch up to the other nodes. Our only work around is to restart the entire blockchain, and that solves the issue until it happens again. But we lose all of the intervening transactions. It will take us a few days to have a spot that we can show it completely disconnected and reconnection does not synch. I cannot get into to Jira to update the STL-1510. In addition to us, Rajaram Kannan is in a thread on the Sawtooth email list. @MicaelFerreira @MatthewRubino @wkatsak are all having it. We thought is was our custom code causing the issues. When we have a clean log showing the issue, is there a place to put it?

MicaelFerreira (Wed, 29 Jan 2020 21:21:27 GMT):
Has joined the channel.

jamesbarry (Wed, 29 Jan 2020 21:24:17 GMT):
We have a second issue around volume that we will try recreating so others can view. It concerns having pending transactionscreated by a high volume node. The queue does not replicate fast enough to catch up and thus we end up with either dropped transactions or transactions pending forewver. Another one we qwill try to build and recreate. This one seems to be connected to the node not catching up issue.

MatthewRubino (Wed, 29 Jan 2020 22:17:35 GMT):
is that the right JIRA? i wasnt able to find anything like this issue. can you link me when you get into jira?

jamesbarry (Wed, 29 Jan 2020 22:41:45 GMT):
@MatthewRubino You are correct in that that is another, though related issue. That was my mistake, working on too many things simultaneously. I apologize for any confusion. We are rebuilding a test suite without our custom code to demo in minikube so we can show the issue. Our test suite was testing higher volumes on the chain and then testing for disconnected nodes and reconnecting them as this is part of our products value proposition.If the node reconnected, we are having issues that the transactions never catch up. We also have an intermittent issue with the disconnected validator not reconnecting in. Our third issue have been the loss of "pending" transactions are dropping from the queue in high volume situations. We need more detail so you all can see the exact issue. I will not comment until our test cases are consistently showing the issues.

MatthewRubino (Wed, 29 Jan 2020 22:49:40 GMT):
have you guys setup infux and grafana? that usually shows the pending tx into back pressure build up if you are hitting things hard.

amundson (Wed, 29 Jan 2020 22:50:15 GMT):
Few non-trivial long-term fixes for catch-up: 1) send txn receipts along with blocks, and apply these receipts efficiently without executing transactions (if you end up with the same state hash, this should be safe as long as you already know the block is the correct one, as we can accomplish with pbft) 2) use the txn receipts to solve non-determinism errors (similar to (1) but using txn receipts if you already trust the block is chosen and can't recreate state by running the txns); 3) Implement state checkpointing, which allows the transfer of state instead of blocks if the block store size exceeds the state store size. 4) expanding on (3), make it so you can do this as-you-go, so you can start processing blocks without having a complete copy of state.

MatthewRubino (Wed, 29 Jan 2020 22:50:17 GMT):
thats not to say that recovering and such isnt still an issue. and that ticket does sound very related

jamesbarry (Wed, 29 Jan 2020 23:04:21 GMT):
@MatthewRubino We have set up Grafana. We keep tuning Sawtooth based on looking at the results we see through Grafana and logs. As an FYI our test environment is 5 nodes and to stress it we started by running SOAK tests of grabbing 5 different weather reports, 1 per node per minute. Got it to the same set up at 1 weather report per node every 10 seconds and started stressing it out. Faster than every 10 seconds has not been successful yet. At a weather report per node every 10 seconds, disconnecting it and reconnecting it led to transactions on the disconnected node not catching up and sometimes nodes not reconnecting. Anyway we are three people with a lot on our plate and will try to demo it properly so all can look at the issue.

MicaelFerreira (Thu, 30 Jan 2020 12:36:18 GMT):
Today i has the same pbft exception in a different node `InternalError: Couldn't find 2f commit messages in the message log for building a seal` , just had to restart the node validator to make it sync again, no need to restart all network

MicaelFerreira (Thu, 30 Jan 2020 12:36:18 GMT):
Today i had the same pbft exception in a different node `InternalError: Couldn't find 2f commit messages in the message log for building a seal` , just had to restart the node validator to make it sync again, no need to restart all network

MicaelFerreira (Thu, 30 Jan 2020 12:36:18 GMT):
Today i had the same pbft exception in a different node `InternalError: Couldn't find 2f commit messages in the message log for building a seal` , just had to restart the node validator to make it sync again, no need to restart the whole network

MicaelFerreira (Thu, 30 Jan 2020 12:46:34 GMT):
I would like to propose a important validation in settings tp when applying the vote: if the setting to be changed is authorized_keys, check if the len of the future authorized_keys list is greater or equal than threshold and proceed if true, or reject if not. Actually as setting-tp is, we can remove a authorized key and have less keys to vote than the approval_threshold, which can cause the invalidity of any future settings changes

MicaelFerreira (Thu, 30 Jan 2020 12:46:34 GMT):
I would like to propose an important validation in settings tp when applying the vote: if the setting to be changed is `authorized_keys`, check if the len of the future authorized_keys list is greater or equal than the approval_threshold and proceed if true, or reject if not. With the settings-tp as it is, we can remove an authorized key and have less keys to vote than the approval_threshold, which can cause the invalidity of any future settings changes

IWontDiscloseMyIdentity (Thu, 30 Jan 2020 14:01:10 GMT):
HI Team , I am trying to get transaction id in response but getting "link": "http://localhost:8008/batch_statuses?id=251339f1cb930dc1b5d4002de941e3fc6219a7d139fe3a8ce024c99e2bb9e383496ed83d9e888bcbd2eb2286c4a4962988f839d267ef35bc8fc316647234c8e2" } batch id could someone please tell me how to get transaction Id instead of batch id it is showing tx id as output in return but on client side ... i am getting batch id someone please help on this

IWontDiscloseMyIdentity (Thu, 30 Jan 2020 14:01:18 GMT):
getting this when i see the response in Transcation Processor response: 'Success', TxId: [ '1a4ecccf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce' ] i want to get this TxID but not getting this on client side

wkatsak (Thu, 30 Jan 2020 18:16:01 GMT):
@amundson @jamesbarry @MatthewRubino So we just managed to get our test chain (not docker, 5 physical geo-distributed nodes) into this state where transactions get stuck in PENDING. Essentially everything you submit gets stuck in pending (we have tried our app + intkey, all the same). All TPs are up.

wkatsak (Thu, 30 Jan 2020 18:17:39 GMT):
validators dont even seem to be touch the the pbft service, as its timestamps are frozen

wkatsak (Thu, 30 Jan 2020 18:17:46 GMT):
e.g. not producing any more output

wkatsak (Thu, 30 Jan 2020 18:23:03 GMT):
For us, block 185 is the last block validated, and compare-chains shows all consistent

wkatsak (Thu, 30 Jan 2020 18:23:03 GMT):
For us, block 188 is the last block validated, and compare-chains shows all consistent

wkatsak (Thu, 30 Jan 2020 18:33:17 GMT):
Block 185 shows a serious of Faild block messages

wkatsak (Thu, 30 Jan 2020 18:33:22 GMT):
An example from one node

wkatsak (Thu, 30 Jan 2020 18:44:38 GMT):
``` [17:21:17.005 [Dummy-5] (unknown file) INFO] [src/state/state_pruning_manager.rs: 134] Pruned 102 keys from the Global state Database [17:21:17.132 [ThreadPoolExecutor-1_7] responder DEBUG] Responding to batch requests 0ae2e61f57a600ae03b99ad5269727efdafa0fac0f94ac43360d3550ccdeba936153cec09232352d607d288f25de4941505c25f8e50b43ea4cd35b59600ac301 [17:21:17.132 [ThreadPoolExecutor-1_7] responder DEBUG] Responding to batch requests 0ae2e61f57a600ae03b99ad5269727efdafa0fac0f94ac43360d3550ccdeba936153cec09232352d607d288f25de4941505c25f8e50b43ea4cd35b59600ac301 [17:21:17.637 [ThreadPoolExecutor-7_0] ffi INFO] [src/journal/chain.rs: 578] Failed block Block(id: 051ccdc6b483173e8902c1fdd1f631be4ccfc2086d54d8af10c70d6b2b79c8684072af379a3aa8dfd15d770afcf83aeb54426f5a737ac2a3e3a7a6ffdd6ea7ba, block_num: 185, state_root_hash: dcb17471b755dc8ccafeacd1e9893eefdbdb73aa776f844caab0cdbb893dd85e, previous_block_id: 479902578e082bdb820a966d2faee9761147c332d1ead1ece351458e7ac271d754fe6d8fbf3b2f7fc13be6caa301eaf6afe6176311c7be2924ac35538fb78b14) [17:21:17.637 [ThreadPoolExecutor-7_0] ffi INFO] [src/journal/chain.rs: 578] Failed block Block(id: 051ccdc6b483173e8902c1fdd1f631be4ccfc2086d54d8af10c70d6b2b79c8684072af379a3aa8dfd15d770afcf83aeb54426f5a737ac2a3e3a7a6ffdd6ea7ba, block_num: 185, state_root_hash: dcb17471b755dc8ccafeacd1e9893eefdbdb73aa776f844caab0cdbb893dd85e, previous_block_id: 479902578e082bdb820a966d2faee9761147c332d1ead1ece351458e7ac271d754fe6d8fbf3b2f7fc13be6caa301eaf6afe6176311c7be2924ac35538fb78b14) [17:21:17.640 [ThreadPoolExecutor-7_1] ffi INFO] [src/journal/chain.rs: 578] Failed block Block(id: 128575c77b54816f48cd375f294542a149f11e6b300c6a31ac736642c17757380880d52231e5f20d64878b9f7ed6778d6319b1e87f8733a8a7ff81f8e7337248, block_num: 185, state_root_hash: 2cc8a12a1e18dcc9288984152546bd392d90e9848377110cba766cd98267bf37, previous_block_id: 479902578e082bdb820a966d2faee9761147c332d1ead1ece351458e7ac271d754fe6d8fbf3b2f7fc13be6caa301eaf6afe6176311c7be2924ac35538fb78b14) [17:21:17.640 [ThreadPoolExecutor-7_1] ffi INFO] [src/journal/chain.rs: 578] Failed block Block(id: 128575c77b54816f48cd375f294542a149f11e6b300c6a31ac736642c17757380880d52231e5f20d64878b9f7ed6778d6319b1e87f8733a8a7ff81f8e7337248, block_num: 185, state_root_hash: 2cc8a12a1e18dcc9288984152546bd392d90e9848377110cba766cd98267bf37, previous_block_id: 479902578e082bdb820a966d2faee9761147c332d1ead1ece351458e7ac271d754fe6d8fbf3b2f7fc13be6caa301eaf6afe6176311c7be2924ac35538fb78b14) [17:21:17.715 [ThreadPoolExecutor-7_1] ffi DEBUG] [src/journal/block_scheduler.rs: 166] Adding block eda8714eb9eae3c44d07b6fb5dce9fc7237e40af7fefe0bc0544d821b56cc6e07f783d1f236c296a2a8ff99e257f17b86f2be2bfa5876e2eb43d53ed281e7b6c for processing [17:21:17.715 [ThreadPoolExecutor-7_1] ffi DEBUG] [src/journal/block_scheduler.rs: 166] Adding block eda8714eb9eae3c44d07b6fb5dce9fc7237e40af7fefe0bc0544d821b56cc6e07f783d1f236c296a2a8ff99e257f17b86f2be2bfa5876e2eb43d53ed281e7b6c for processing [17:21:17.920 [Dummy-4] (unknown file) INFO] [src/journal/block_validator.rs: 265] Block eda8714eb9eae3c44d07b6fb5dce9fc7237e40af7fefe0bc0544d821b56cc6e07f783d1f236c296a2a8ff99e257f17b86f2be2bfa5876e2eb43d53ed281e7b6c passed validation [17:21:17.920 [Dummy-4] (unknown file) INFO] [src/journal/block_validator.rs: 265] Block eda8714eb9eae3c44d07b6fb5dce9fc7237e40af7fefe0bc0544d821b56cc6e07f783d1f236c296a2a8ff99e257f17b86f2be2bfa5876e2eb43d53ed281e7b6c passed validation [17:21:17.990 [Dummy-5] (unknown file) INFO] [src/journal/chain.rs: 206] Building fork resolution for chain head 'Block(id: cf74778ffc28b99d170cea524f3c124ffd105a439a604a17e7fe9d9921f8f2013ba061a7556de9349691a6fad3b44932238b02437e9ac08b43b6822b475c7303, block_num: 185, state_root_hash: 2cc8a12a1e18dcc9288984152546bd392d90e9848377110cba766cd98267bf37, previous_block_id: 479902578e082bdb820a966d2faee9761147c332d1ead1ece351458e7ac271d754fe6d8fbf3b2f7fc13be6caa301eaf6afe6176311c7be2924ac35538fb78b14)' against new block 'Block(id: c6af7c91d26c075d14b5408be396e1f43bf21dbe6ce58218a3f284e2881065a24ea54b1a6c258dadf94c567d1707bf3d1f393d9ac0c92188fc93d0c2460999dd, block_num: 186, state_root_hash: dcb17471b755dc8ccafeacd1e9893eefdbdb73aa776f844caab0cdbb893dd85e, previous_block_id: cf74778ffc28b99d170cea524f3c124ffd105a439a604a17e7fe9d9921f8f2013ba061a7556de9349691a6fad3b44932238b02437e9ac08b43b6822b475c7303)' ```

wkatsak (Thu, 30 Jan 2020 19:00:33 GMT):
Here is that node's pbft log

wkatsak (Thu, 30 Jan 2020 19:00:45 GMT):

wkatsak - Thu Jan 30 2020 14:00:39 GMT-0500 (Eastern Standard Time).txt

wkatsak (Thu, 30 Jan 2020 19:05:43 GMT):
To me this looks ok, just like a block being resolved

ltseeley (Thu, 30 Jan 2020 19:05:48 GMT):
@wkatsak hard to say for sure what happened around block 185, but it was able to handle it eventually.

wkatsak (Thu, 30 Jan 2020 19:06:08 GMT):
@ltseeley thats my thought as well, as other blocks are committed afterwards, on all nodes

wkatsak (Thu, 30 Jan 2020 19:06:25 GMT):
What stops it dead though seems to be one node losing connection and reconnecting

ltseeley (Thu, 30 Jan 2020 19:07:56 GMT):
Do you have validator logs that indicate that?

wkatsak (Thu, 30 Jan 2020 19:08:45 GMT):
I'm trying to collect now

wkatsak (Thu, 30 Jan 2020 19:08:53 GMT):
this might have happened after the failure

wkatsak (Thu, 30 Jan 2020 19:08:55 GMT):
checking

wkatsak (Thu, 30 Jan 2020 19:09:17 GMT):
nm the disconnect happened 4 mins after the network seized

wkatsak (Thu, 30 Jan 2020 19:15:05 GMT):
actually reconnect happened later

wkatsak (Thu, 30 Jan 2020 19:15:45 GMT):
two of my nodes show a series of errors sending PING_RESPONSE and NETWORK_ACK. Like this:

wkatsak (Thu, 30 Jan 2020 19:15:46 GMT):
`[17:21:17.099 [ThreadPoolExecutor-1_3] dispatch WARNING] Can't send message NETWORK_ACK back to 6e8569f0e8fa584304a9d314665b6a2931c5295936ef47b009e53c83490a28315f0260dc2d9bc1e761b8fe35c252d2439a18af9cb6c76d92764aeeccd0837f08 because connection OutboundConnectionThread-tcp://bc3.dcntral.net:8800 not in dispatcher`

wkatsak (Thu, 30 Jan 2020 19:17:15 GMT):
this all lines up to 17:21, which is when the network locked

wkatsak (Thu, 30 Jan 2020 19:17:22 GMT):
i can upload the entire logs if that helps

wkatsak (Thu, 30 Jan 2020 19:49:57 GMT):
@ltseeley I restarted the bc4 and bc5, which were the ones with the comm errors, and the chain came back to life.

wkatsak (Thu, 30 Jan 2020 19:50:21 GMT):
Ive been suspecting for some time that the peering/connection management has a bug, or some inconsistency

wkatsak (Thu, 30 Jan 2020 19:50:41 GMT):
even now, its operating, but I see something like this `Can't send message PING_RESPONSE back to 0e904482b541c3615d021925377d35894c6964e117c6e972d0d5c1e7443c4b38a5126b72a9e61bfdaaf79a31bcd50f3657aa05a85ef08a2266c385840fc15e15 because connection OutboundConnectionThread-tcp://bc4.dcntral.net:8800 not in dispatcher`

wkatsak (Thu, 30 Jan 2020 19:56:24 GMT):
another node, bc3, has some batches stuck in pending, and these wont unwedge

wkatsak (Thu, 30 Jan 2020 19:56:33 GMT):
even submitting new ones from the same family

MatthewRubino (Thu, 30 Jan 2020 20:37:33 GMT):
so possibly because of the network issues it thinks it shared it with the other nodes? thus if you reboot it they go away and are lost? otherwise they stay never being considered?

wkatsak (Thu, 30 Jan 2020 21:07:54 GMT):
thats what it looks like. they are stuck in pending unless i restart the validator

wkatsak (Thu, 30 Jan 2020 21:08:13 GMT):
i ended up having to restart everything to get the block heights to all agree

wkatsak (Thu, 30 Jan 2020 21:08:33 GMT):
i dont know if the network issues are related at all

Tomomi.Yamano (Mon, 03 Feb 2020 04:51:21 GMT):
Has joined the channel.

MicaelFerreira (Mon, 03 Feb 2020 10:05:42 GMT):
I use to have a lot of those ping response messages as well, but network still good so far

MicaelFerreira (Mon, 03 Feb 2020 10:26:06 GMT):
Guys i have a question about pbft members: i removed one of my nodes from the network , removed it from pbft members list and removed all his data as well. After that, i joined the node to the network as a new node (without adding it to pbft members list). the node received all the blocks but the last (even with all the displayed errors that are shown at catching blocks). I did some transactions on other nodes and at the new node as well, and this new node keep getting the blocks but always one block behind. So, without adding the new node to pbft members looks like he can received / validate blocks and publish blocks. Looking at the `on_peer_message` https://github.com/hyperledger/sawtooth-pbft/blob/master/src/node.rs#L104, i see that if the node is not part of the network he do not send any kind of messages, but he does.. What is happening?

MicaelFerreira (Mon, 03 Feb 2020 10:26:06 GMT):
Guys i have a question about pbft members: i removed one of my nodes from the network , removed it from pbft members list and removed all his data as well. After that, i joined the node to the network as a new node (without adding it to pbft members list). the node received all the blocks but the last (even with all the displayed errors that are shown at catching blocks). I did some transactions on other nodes and at the new node as well, and this new node keep getting the blocks but always one block behind. So, without adding the new node to pbft members looks like he can received / validate blocks and publish blocks. Looking at the `on_peer_message` https://github.com/hyperledger/sawtooth-pbft/blob/master/src/node.rs#L104, i see that if the node is not part of the network he do not send any kind of messages, but looks like it does.. All i see in the logs is the `InvalidMessage` WARN, but how he can receive and publish blocks? What is happening?

MicaelFerreira (Mon, 03 Feb 2020 10:45:00 GMT):
Got this exception at the middle of the process of catching up the network with a new added node ( network have about 720 blocks, and this happened at 315 block +- ) ``` Feb 03 10:37:07 svg-node pbft-engine[9612]: WARN | pbft_engine::engine: | InvalidMessage: NewView failed verification - Error was: InvalidMessage: Node is on view 4008, but received NewView message for view 4008 Feb 03 10:37:30 svg-node sawtooth-validator[9588]: thread 'ChainThread:CommitReceiver' panicked at 'No method cancel on python scheduler: PyErr { ptype: , pvalue: Some(KeyError('Value was not found',)), ptraceback: Some() }', src/libcore/result.rs:1084:5 Feb 03 10:37:30 svg-node sawtooth-validator[9588]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace. Feb 03 10:37:31 svg-node sawtooth-validator[9588]: thread 'ChainThread:ValidationResultReceiver' panicked at 'No lock holder should have poisoned the lock: "PoisonError { inner: .. }"', src/libcore/result.rs:1084:5 Feb 03 10:37:31 svg-node sawtooth-validator[9588]: thread '' panicked at 'RwLock is poisoned: "PoisonError { inner: .. }"', src/libcore/result.rs:1084:5 Feb 03 10:37:31 svg-node sawtooth-validator[9588]: fatal runtime error: failed to initiate panic, error 5 Feb 03 10:37:31 svg-node pbft-engine[9612]: ERROR | pbft_engine::engine: | Disconnected from validator; stopping PBFT Feb 03 10:37:31 svg-node pbft-engine[9612]: ERROR | pbft_engine:108 | ReceiveError: Unexpected error while receiving: DisconnectedError ```

MicaelFerreira (Mon, 03 Feb 2020 10:45:00 GMT):
Got this exception at the middle of the process of catching up the network with a new added node ( network have about 720 blocks, and this happened at 315 block +- ) ``` Feb 03 10:37:07 svg-node pbft-engine[9612]: WARN | pbft_engine::engine: | InvalidMessage: NewView failed verification - Error was: InvalidMessage: Node is on view 4008, but received NewView message for view 4008 Feb 03 10:37:30 node sawtooth-validator[9588]: thread 'ChainThread:CommitReceiver' panicked at 'No method cancel on python scheduler: PyErr { ptype: , pvalue: Some(KeyError('Value was not found',)), ptraceback: Some() }', src/libcore/result.rs:1084:5 Feb 03 10:37:30 node sawtooth-validator[9588]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace. Feb 03 10:37:31 node sawtooth-validator[9588]: thread 'ChainThread:ValidationResultReceiver' panicked at 'No lock holder should have poisoned the lock: "PoisonError { inner: .. }"', src/libcore/result.rs:1084:5 Feb 03 10:37:31 node sawtooth-validator[9588]: thread '' panicked at 'RwLock is poisoned: "PoisonError { inner: .. }"', src/libcore/result.rs:1084:5 Feb 03 10:37:31 node sawtooth-validator[9588]: fatal runtime error: failed to initiate panic, error 5 Feb 03 10:37:31 node pbft-engine[9612]: ERROR | pbft_engine::engine: | Disconnected from validator; stopping PBFT Feb 03 10:37:31 node pbft-engine[9612]: ERROR | pbft_engine:108 | ReceiveError: Unexpected error while receiving: DisconnectedError ```

MicaelFerreira (Mon, 03 Feb 2020 10:45:00 GMT):
Got this exception at the middle of the process of catching up the network with a new added node ( network have about 720 blocks, and this happened at 315 block +- ) ``` Feb 03 10:37:07 node pbft-engine[9612]: WARN | pbft_engine::engine: | InvalidMessage: NewView failed verification - Error was: InvalidMessage: Node is on view 4008, but received NewView message for view 4008 Feb 03 10:37:30 node sawtooth-validator[9588]: thread 'ChainThread:CommitReceiver' panicked at 'No method cancel on python scheduler: PyErr { ptype: , pvalue: Some(KeyError('Value was not found',)), ptraceback: Some() }', src/libcore/result.rs:1084:5 Feb 03 10:37:30 node sawtooth-validator[9588]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace. Feb 03 10:37:31 node sawtooth-validator[9588]: thread 'ChainThread:ValidationResultReceiver' panicked at 'No lock holder should have poisoned the lock: "PoisonError { inner: .. }"', src/libcore/result.rs:1084:5 Feb 03 10:37:31 node sawtooth-validator[9588]: thread '' panicked at 'RwLock is poisoned: "PoisonError { inner: .. }"', src/libcore/result.rs:1084:5 Feb 03 10:37:31 node sawtooth-validator[9588]: fatal runtime error: failed to initiate panic, error 5 Feb 03 10:37:31 node pbft-engine[9612]: ERROR | pbft_engine::engine: | Disconnected from validator; stopping PBFT Feb 03 10:37:31 node pbft-engine[9612]: ERROR | pbft_engine:108 | ReceiveError: Unexpected error while receiving: DisconnectedError ```

MicaelFerreira (Mon, 03 Feb 2020 10:50:34 GMT):
Had to restart the node to successfully sync with the network

cg223 (Mon, 03 Feb 2020 19:07:17 GMT):
Has joined the channel.

MatthewRubino (Mon, 03 Feb 2020 19:53:50 GMT):
so with the block into tp changes I am able to get the network syncing again. however the behind nodes will only sync up to 1 block behind the head. they then stop and will not want to go further. i have restated the node numerous times to no avail.

MatthewRubino (Mon, 03 Feb 2020 19:53:50 GMT):
so with the block info tp changes I am able to get the network syncing again. however the behind nodes will only sync up to 1 block behind the head. they then stop and will not want to go further. i have restated the node numerous times to no avail.

MatthewRubino (Mon, 03 Feb 2020 19:59:17 GMT):
i am trying to get our network to a state where i can remove it and then commit transactions, to see if it will then catch up to N-1 still

MatthewRubino (Tue, 04 Feb 2020 14:14:23 GMT):
i cannot get a new transaction to commit (I believe) as a result of this error: `ERROR | pbft_engine::engine: | InternalError: Couldn't find 2f commit messages in the message log for building a seal`

MatthewRubino (Tue, 04 Feb 2020 16:04:20 GMT):
we had taken some snapshots and i spun up some early node states. i removed the node that had our max block 61500. I left the nodes that have the N-1 (61499). The snapshots were on various states around 58k-59k. they synced agains the 61499 nodes. they stopped at 61498... so there appears to be something broken with syncing in that they do not want to catch up to head.

ltseeley (Tue, 04 Feb 2020 17:25:25 GMT):
@MatthewRubino what validator and PBFT versions are you using?

MatthewRubino (Tue, 04 Feb 2020 17:26:11 GMT):
validator is 1.2.3 and pbft is 1.2.4

MatthewRubino (Tue, 04 Feb 2020 17:26:48 GMT):
we have 12 nodes. so we figure out that our 2f+1 is 6 1/3. so we likely need 7 nodes for consensus. we only had 12

MatthewRubino (Tue, 04 Feb 2020 17:26:52 GMT):
this is why we couldnt commit

MatthewRubino (Tue, 04 Feb 2020 17:27:18 GMT):
but unsure ATM if that is related to not syncing to head (vs head - 1)

MatthewRubino (Tue, 04 Feb 2020 17:28:28 GMT):
from the code: let f = ((config.members.len() - 1) / 3) as u64;

jamesbarry (Tue, 04 Feb 2020 18:26:29 GMT):
Are you running pbft 1.0.02 There is no 1.2.4 for PBFT

MatthewRubino (Tue, 04 Feb 2020 18:47:10 GMT):
ah right; 1.0.0 (`b49c0d01b827`)

MatthewRubino (Tue, 04 Feb 2020 18:47:10 GMT):
ah right; 1.0.1 (`b49c0d01b827`)

MatthewRubino (Tue, 04 Feb 2020 18:47:10 GMT):
ah right, sorry; 1.0.1 (`b49c0d01b827`)

MatthewRubino (Tue, 04 Feb 2020 18:47:10 GMT):
ah right, sorry; PBFT is 1.0.1 (`b49c0d01b827`)

jamesbarry (Tue, 04 Feb 2020 18:58:30 GMT):
@amundson @arsulegai Is there any documentation - other than in the code - for the node connection management internal to Sawtooth? We believe there is a network bug inside the validator. We believe the code is internal because certain circumstances where where there is a network interruption between one or more nodes can cause the entire consensus to stop. Bill is traveling internationally and will continue to try to reproduce this so it can be shown when he lands. In the meantime, let us know if there is anything other than code we can look though?

jamesbarry (Tue, 04 Feb 2020 18:58:30 GMT):
@amundson @arsulegai @wkatsak Is there any documentation - other than in the code - for the node connection management internal to Sawtooth? We believe there is a network bug inside the validator. We believe the code is internal because certain circumstances where where there is a network interruption between one or more nodes can cause the entire consensus to stop. Bill is traveling internationally and will continue to try to reproduce this so it can be shown when he lands. In the meantime, let us know if there is anything other than code we can look though?

Dan (Tue, 04 Feb 2020 20:32:33 GMT):
There's this networks section in the docs: https://sawtooth.hyperledger.org/docs/core/releases/latest/architecture/validator_network.html

jamesbarry (Tue, 04 Feb 2020 21:52:14 GMT):
Thanks for the link, we looked at it. Bill (Our chief architect/programmer) was looking for some deeper docs on internal flows. We don't have issues between node, but rather in nodes. We are trying to get a demo assembled to show the issue and make it repeatable.

amundson (Tue, 04 Feb 2020 23:06:45 GMT):
@jamesbarry a bug like that could be in the validator or within the consensus engine (either side), if you get a sense for one or the other from the logs that would be helpful too. maybe try taking the consensus engine down and back up to make sure it reconnects and starts working again (it should).

jamesbarry (Tue, 04 Feb 2020 23:16:57 GMT):
@amundson @wkatsak Thank you for the suggestion. We are in test have have brought the consensus down and up several times. Just did our 3rd refresh complete from genesis and have the same issue. I just posted the comment up, as several folks have issues that may have a similar origin. I won't post much more until we have a replication that will show you it happening consistently. Haven't gotten a consistent reproduction yet, so I will be fairly quiet until we can reproduce the issue. In the meantime we are on Sawtooth 1.2.4 with 1.0.2 for the PBFT. We did have a five node AWS network that confirmed 85k transaction into the ledger in 30 hours, before it died. Our old v1.1.x four node mixed hardware/software blockchain running a consistent set of transactions daily was up for five months and 750k + transactions, al with zero issues. The 1.2.x version have given us trouble. Not sure if its Sawtooth, or our custom code. When we are sure, you'll see logs and a reproducer. Thanks for yoru thoughts.

amundson (Tue, 04 Feb 2020 23:56:29 GMT):
is the network constantly busy, or does it have quiet periods?

wkatsak (Wed, 05 Feb 2020 00:19:22 GMT):
On this test we have each node committing something every 10 seconds at minimum.

wkatsak (Wed, 05 Feb 2020 00:20:20 GMT):
I thought I read someplace If you restart consensus you need to restart the validator. Is this not true anymore?

amundson (Wed, 05 Feb 2020 01:49:19 GMT):
hmm, not 100% sure. that wasn't the original intent.

MatthewRubino (Wed, 05 Feb 2020 17:48:43 GMT):
is there anything in pbft-engine that needs to be upgraded as a result of validator 1.2.4 (presumably the zmq heartbeat stuff)? I get disconnects and docker container terminations.

MatthewRubino (Wed, 05 Feb 2020 17:56:45 GMT):
```pbft-engine-2 | ERROR | pbft_engine::engine: | Disconnected from validator; stopping PBFT pbft-engine-2 | ERROR | pbft_engine:108 | ReceiveError: Received unexpected message type: PING_REQUEST pbft-engine-2 exited with code 1 ```

MatthewRubino (Wed, 05 Feb 2020 17:56:45 GMT):
```pbft-engine-1 | ERROR | pbft_engine::engine: | Disconnected from validator; stopping PBFT pbft-engine-1 | DEBUG | sawtooth_sdk::messag | Disconnected outbound channel pbft-engine-1 | DEBUG | sawtooth_sdk::messag | Exited stream pbft-engine-1 | DEBUG | zmq:547 | socket dropped pbft-engine-1 | DEBUG | zmq:547 | socket dropped pbft-engine-1 | DEBUG | zmq:454 | context dropped pbft-engine-1 | ERROR | pbft_engine:108 | ReceiveError: Received unexpected message type: PING_REQUEST pbft-engine-1 exited with code 1 ```

MatthewRubino (Wed, 05 Feb 2020 17:56:45 GMT):
```pbft-engine-1 | INFO | pbft_engine:88 | Sawtooth PBFT Engine (1.0.1) pbft-engine-1 | INFO | pbft_engine::engine: | Startup state received from validator: StartupState { chain_head: Block(block_num: 0, block_id: [...], previous_id: [...], signer_id: [...], payload: 47656e65736973, summary: 9e32bb0045dc6e7008a5b57c34558aa0dc08fa657ca2d3d4063e22a39014a1fe), peers: [PeerInfo { peer_id: [...] }, PeerInfo { peer_id: [...] }, PeerInfo { peer_id: [...] }, PeerInfo { peer_id: [...] }], local_peer_info: PeerInfo { peer_id: [...] } } pbft-engine-1 | DEBUG | pbft_engine::config: | Getting on-chain settings for config pbft-engine-1 | INFO | pbft_engine::engine: | PBFT config loaded: PbftConfig { members: [...], block_publishing_delay: 1s, update_recv_timeout: 10ms, exponential_retry_base: 100ms, exponential_retry_max: 60s, idle_timeout: 30s, commit_timeout: 10s, view_change_duration: 5s, forced_view_change_interval: 100, max_log_size: 10000, storage_location: "memory" } pbft-engine-1 | INFO | pbft_engine::engine: | PBFT state created: (PP, view 0, seq 1) pbft-engine-1 | INFO | pbft_engine::engine: | Received PeerConnected message with peer info: PeerInfo { peer_id: [...] } pbft-engine-1 | INFO | pbft_engine::engine: | Received PeerConnected message with peer info: PeerInfo { peer_id: [...] } pbft-engine-1 | INFO | pbft_engine::engine: | Received PeerConnected message with peer info: PeerInfo { peer_id: [...] } pbft-engine-1 | INFO | pbft_engine::engine: | Received PeerConnected message with peer info: PeerInfo { peer_id: [...] } pbft-engine-1 | ERROR | pbft_engine::engine: | Disconnected from validator; stopping PBFT pbft-engine-1 | ERROR | pbft_engine:108 | ReceiveError: Received unexpected message type: PING_REQUEST pbft-engine-1 | DEBUG | sawtooth_sdk::messag | Disconnected outbound channel pbft-engine-1 | DEBUG | sawtooth_sdk::messag | Exited stream pbft-engine-1 | DEBUG | zmq:547 | socket dropped pbft-engine-1 | DEBUG | zmq:547 | socket dropped pbft-engine-1 | DEBUG | zmq:454 | context dropped pbft-engine-1 exited with code 1 ```

agunde (Wed, 05 Feb 2020 18:10:03 GMT):
@MatthewRubino I am looking into that error. Was PBFT registering when that error occurred?

agunde (Wed, 05 Feb 2020 18:11:41 GMT):
Also what version of PBFT are you running?

MatthewRubino (Wed, 05 Feb 2020 18:22:18 GMT):
it appears to be. edited the above to the full log with some data bits truncated

MatthewRubino (Wed, 05 Feb 2020 18:22:18 GMT):
it appears to be. edited the above to the full log with some data bits truncated (`...`)

agunde (Wed, 05 Feb 2020 18:30:57 GMT):
Okay looks like it completed registration. I think the issue is that 1.0.1 version of PBFT is built on top of Sawtooth SDK 0.2, but there was a fix that went in to Sawtooth SDK 0.3 to stop unexpected messages from causing the engine to stop. I think we will need a new release of PBFT to work with Sawtooth Validator 1.2.4.

MatthewRubino (Wed, 05 Feb 2020 18:32:18 GMT):
would that sort of thing effect custom TPs too?

agunde (Wed, 05 Feb 2020 18:38:28 GMT):
Yes, the transaction processor part of the rust sdk also had that issue and was fixed in 0.3. What language is your custom TP in?

agunde (Wed, 05 Feb 2020 18:46:38 GMT):
@MatthewRubino No, If the TP are using the sdks, they already handle the ping requests. The PingRequest messages where added a long time ago as a way for the validator to detect if a transaction processor went away.

wkatsak (Wed, 05 Feb 2020 19:08:59 GMT):
@agunde @MatthewRubino I think we might be having this issue as well, we just configured a test cluster with 1.2.4 and are having a hard time bringing it up

wkatsak (Wed, 05 Feb 2020 19:09:53 GMT):
Could this issue cause any transient effects, or would it always be a termination?

MatthewRubino (Wed, 05 Feb 2020 20:30:07 GMT):
i always saw the termination anyways. i was running 5 nodes locally and 4 of the pbft engines would terminate

gandhikim (Thu, 06 Feb 2020 01:26:35 GMT):
Has joined the channel.

jamesbarry (Thu, 06 Feb 2020 15:55:38 GMT):

Clipboard - February 6, 2020 8:55 AM

jamesbarry (Thu, 06 Feb 2020 15:55:45 GMT):
@wkatsak @MatthewRubino @agunde @amundson Bill and I have seen the same thing as Mathew. It sounds like we need a new release of PBFT? We are working ona demo to show the issue. I am attaching a snippet of our log file

amundson (Thu, 06 Feb 2020 16:04:54 GMT):
there will be a release of PBFT today or tomorrow

amundson (Thu, 06 Feb 2020 16:05:36 GMT):
(probably today)

jamesbarry (Thu, 06 Feb 2020 16:05:47 GMT):
@amundson Perfect - Thank you!

madhusudan.rao (Fri, 07 Feb 2020 09:46:53 GMT):
Has joined the channel.

RajaramKannan (Fri, 07 Feb 2020 09:50:14 GMT):

RajaramKannan - Mon Feb 03 2020 10_37_28 GMT+0530 (India Standard Time) (1).txt

RajaramKannan (Fri, 07 Feb 2020 09:56:06 GMT):
Has joined the channel.

RajaramKannan (Fri, 07 Feb 2020 09:56:07 GMT):
Issue #2 is the one I posted yesterday where the consensus get issue was coming. On restarting the node, the validator would immediately exit with a keyerror. On further investigation, what we found was that we had about a 100 batches with transactions that did not cause any state change. So the blocks had the state root hash the same. it appears the validator will then start to fail (if it is up) or will exit (if you bring it down and back up). the /state for the running node returned the "Head Not Found" "There is no block with the id specified in the 'head' query parameter." ``` One of our engineers dug into the sawtooth code and it looks like it is trying to rebuild state from some block and our theory is that the \blocks fetches by default only 100 blocks. The last state root hash change is at the 101 block from the current head. (Just a theory - that it is causing the issue. We were able to consistently reprocude it in our lower environments a number of different times by running 100txns that cause no state change. The 101st then starts to see the consensus get keyerror message and on restart the validator exits.) ```

RajaramKannan (Fri, 07 Feb 2020 10:05:08 GMT):
we are using 1.1.5 validator with 1.01 PBFT

RajaramKannan (Fri, 07 Feb 2020 10:55:25 GMT):
on the 1st issue and yes when we tried bringing up a node it was stalling in the catchup at that same block. The logs do look a little similar to what I think you hd posted (our versions are different though). Will the new PBFT proposed release fix that even if we continue with 1.1.5?

RajaramKannan (Fri, 07 Feb 2020 10:55:25 GMT):
on the 1st issue and yes when we tried bringing up a node it was stalling in the catchup at that same block. The logs do look a little similar to what I think you hd posted (our versions are different though). Will the new PBFT proposed release fix that even if we continue with 1.1.5? @jamesbarry @amundson

jamesbarry (Fri, 07 Feb 2020 19:12:43 GMT):
@RajaramKannan @agunde pasted this note"Okay looks like it completed registration. I think the issue is that 1.0.1 version of PBFT is built on top of Sawtooth SDK 0.2, but there was a fix that went in to Sawtooth SDK 0.3 to stop unexpected messages from causing the engine to stop. I think we will need a new release of PBFT to work with Sawtooth Validator 1.2.4."

jamesbarry (Fri, 07 Feb 2020 19:16:09 GMT):
We have seen your issue#1 with our test server. We may have seen issue #2, as it is somewhat similar to what we have seen. We intend to test the new PBFT and see if that fixes our issues, or write a gist to show problems we are having. At this point we are ina waiting mode for the new PBFT.

MatthewRubino (Fri, 07 Feb 2020 19:19:54 GMT):
fyi we are running with 1.2.4 for everything except validator on 1.2.3, or are you needing those ZMQ changes?

rbuysse (Fri, 07 Feb 2020 20:03:37 GMT):
PBFT 1.0.2 has been released. You can read the release notes here: https://github.com/hyperledger/sawtooth-pbft/blob/v1.0.2/RELEASE_NOTES.md

RajaramKannan (Sat, 08 Feb 2020 08:03:55 GMT):
@jamesbarry thanks, very much appreciated. On reading @agunde message once again, i think it might help fix the catchup issue. I presume the unexpected message is the one where in our case the invalid block 255 would not stall the pbft engine and it would potentially continue till it received the valid block? What is still a mystery to me is why is bock 255 getting re-published every now and again (all nodes in the network show the head at 886)....

adityasingh177 (Sat, 08 Feb 2020 14:48:32 GMT):
Has joined the channel.

jamesbarry (Sun, 09 Feb 2020 16:30:18 GMT):
@RajaramKannan We have the same issue as you are with your block 255. We get an invalid block and it gets republished. Luckily I am in constant test mode, no production yet. We are trying to separate out our highly customized code, so we can post some active logs of straight Sawtooth and show a demo of where we get stuck. So stay tuned. We will be contributing code back, if we get it to work properly. Otherwise we will be asking for smarter minds than us to help.

RajaramKannan (Mon, 10 Feb 2020 05:56:22 GMT):
@jamesbarry thanks once again. We will wait for your updates and root for your success in getting it to work properly. I guess the republishing itself is a validator issue. I presume then 1.0.2 PBFT will still help with ensuring atleast PBFT does not get stuck on the catchup? (It does work ok on the republishing itself today in 1.0.1 if it is already caught up...)

RajaramKannan (Mon, 10 Feb 2020 05:57:33 GMT):
today the way we are getting over this is by transferring the entire _data files from a current node to the new node so that it doesnt need to catchup and that is working fine ..

gandhikim (Mon, 10 Feb 2020 11:08:07 GMT):
My issue don't sync node1 and node2 but first block sync is succes. sawtooth-cli (Hyperledger Sawtooth) version 1.2.4 ubuntu 18.04 1 step : node1 run 2 step : set intkey five times 3 step : node2 run sawtooth-validator log ERROR proxy] State from block 9e0ed232.... requested, but root hash 71da3e57.... was missing. Returning empty state. poet-engine log ERROR poet_block_verifier] Block 9e0ed232 rejected: Received block from an unregistered validator 025f88b8...311d7417

arsulegai (Mon, 10 Feb 2020 11:50:08 GMT):
@gandhikim Question answered in #sawtooth

jamesbarry (Mon, 10 Feb 2020 15:16:38 GMT):
From Shawn on what the PBFT release 1.02 should do to help our issues: "amundson Technical Ambassador - I think that the SDK update was necessary to handle heartbeat messages that the newer validator sends to keep the connection open when there is very low network activity (and a firewall that will timeout connections in betweeen)." My team will be installing this over the next couple of days in our test harness to see the results. Unfortunately we can't get to it right away, but if anyone else has results, feel free to share them and we can see if the timeout issue disapate or disapear.

AnthonyWhite (Tue, 11 Feb 2020 10:24:25 GMT):
Has joined the channel.

jamesbarry (Tue, 11 Feb 2020 18:01:05 GMT):
@RajaramKannan @MatthewRubino @agunde @amundson @rejereggie We have installed the new PBFT v1.02 and have run 48k + transactions in 11 hours on our 5 node test network with no dropped connection issues. I copied several folks who appear to have similar issues to ours. @wkatsak and I believe our dropped connection issue was resolved with the latest PBFT release. We hope it also works for the rest of you.

rejereggie (Tue, 11 Feb 2020 18:01:05 GMT):
Has joined the channel.

agunde (Tue, 11 Feb 2020 18:01:33 GMT):
Glad to hear!

rejereggie (Tue, 11 Feb 2020 21:34:28 GMT):
jamesbarry thank you for the note. I think I am starting to get a handle on my coding.

pschwarz (Tue, 11 Feb 2020 22:50:41 GMT):
PR for rewriting the permission verifier in rust: https://github.com/hyperledger/sawtooth-core/pull/2251 (as well as a couple of changes that fix issues with permissions and forks)

pschwarz (Tue, 11 Feb 2020 22:50:41 GMT):
PR for rewriting the permission verifier in rust: https://github.com/hyperledger/sawtooth-core/pull/2251 (as well as a couple of changes that fix issues with permissions and forks; block validation)

jamesbarry (Wed, 12 Feb 2020 00:29:13 GMT):

jamesbarry - Tue Feb 11 2020 17:28:46 GMT-0700 (Mountain Standard Time).txt

jamesbarry (Wed, 12 Feb 2020 00:30:53 GMT):
The file above is oiur timeout error occurred again at 61k transactions. I am dropping in our logging from when it happened. Timeouts again.... These transactions are a weather pulled from an API every 10 seconds in our test environment

RajaramKannan (Wed, 12 Feb 2020 04:02:29 GMT):
@jamesbarry thanks. We are still on 1.1.5 (will move to 1.2.x later this year - as we have consortium partners that all need to get past their internal IS/Compliance teams). So I am still not clear if it will solve our issue (my current understanding is that the PBFT engine is unable to handle certain messages from the 1.2.4/newer validator based on your note above). We will however in the next few days test and see if it has some welcome side effect with 1.1.5 to help with the catchup issue.

jamesbarry (Thu, 13 Feb 2020 15:04:38 GMT):

Clipboard - February 13, 2020 8:04 AM

jamesbarry (Thu, 13 Feb 2020 15:08:04 GMT):
@agunde @amundson More informationon our crash. We set three completely seperate networks up. We have been wanting to see if its IP4, IP6 or a mix of netowrks. Network 1 was IP4 only, network 2 was IP6 only and network 3 was IP 6 running to our three developer houses and 2 AWS nodes. Each had 5 nodes, with all five nodes pulling is a weather report from a differnet location every 10 seconds. All of them experienced network issues. When a single validator crashed because of a momentary loss of a network connection, sometimes the node was never able to automatically reconnect. It appears the loss of 2 nodes or more would crash the system. One node only slows the system to a crawl. Will see if anything interesting is in the log and post here. But network fluxuations are causing disconnected nodes that cannot reconnect.

agunde (Thu, 13 Feb 2020 16:08:02 GMT):
@ltseeley ^

ltseeley (Thu, 13 Feb 2020 16:11:03 GMT):
2 nodes going down would halt the network due to there not being enough votes for PBFT to commit anything. Let us know what you find in the logs regarding the disconnections.

jamesbarry (Thu, 13 Feb 2020 20:07:15 GMT):
@Itseeley Thank you for that- I should have know about 2 nodes. We are re-running three new chains now. One question I have is is there a way to limit the number of blocks being added from a backlog on a node that disconnects and tries to reconnect? Or making it stream instead of submitting so many blaocks again. Out nodes are Ubuntu 18.04 on AWS Mediums with 4 gigs RAM. Upon reconnecting the RAM &CPU max and the node crashes. Hence never connecting back again. We are running AWS large with 8 gIGS now, and we will see. But I am wondering if the sudden rush of blocks behind causes issues in processing. Thanks again.

MatthewRubino (Thu, 13 Feb 2020 20:51:27 GMT):
I thought I would echo this. we upgraded to PBFT 1.0.2 and that allowed us to use the validator 1.2.4. we have a 5 node network in test which was hitting the connection issues. those have stopped and it is running smoothly. our 12 node prod network is also continuing to run smoothly.

MicaelFerreira (Fri, 14 Feb 2020 16:21:28 GMT):
Hi, got this message ```sawtooth-validator[11736]: [2020-02-14 15:55:03.482 DEBUG ffi] [src/journal/candidate_block.rs: 420] Batch 92c9a98e8b769a9485c85a19482b31abdaf9ee270f008a67f71be714b16f84832a70464e7f53f75e8284e709274c1e48a0fda9bbfb1e4cab99cf80e868823982 invalid, not added to block pbft-engine[11772]: ERROR | pbft_engine::engine: | InternalError: Couldn't find 2f commit messages in the message log for building a seal ``` when posting an Invalid batch. I'm already at pbft version 1.0.2

MatthewRubino (Fri, 14 Feb 2020 17:31:53 GMT):
is your config using poet or pbft?

MatthewRubino (Fri, 14 Feb 2020 17:32:16 GMT):
and did you start on poet and switch? if so i think you need to keep the poet-tp

arsulegai (Fri, 14 Feb 2020 17:36:33 GMT):
Why is that?

arsulegai (Fri, 14 Feb 2020 17:37:04 GMT):
Is it a new node addition, or all the old nodes which have processed the transaction to switch to PBFT?

MicaelFerreira (Fri, 14 Feb 2020 17:58:05 GMT):
No consensus switch or node addition, I was testing an action that was invalid, and so the batch, and that specific node where I posted the batch start log that `2f commit` logs. This node was also ignored by all other 4 nodes, which required a restart to stay OK

MicaelFerreira (Fri, 14 Feb 2020 17:58:05 GMT):
No consensus switch or node addition, I was testing an action that was invalid, and so the batch, and that specific node where I posted the batch start log that `2f commit` logs. This node was also ignored by all other 4 nodes, which required a restart to be accepted by the nodes again and be part of the network

MatthewRubino (Fri, 14 Feb 2020 18:28:10 GMT):
where is the `poet-engine` coming from? maybe its the container name (but its running pbft)... threw me off

MicaelFerreira (Mon, 17 Feb 2020 10:12:38 GMT):
Sorry but I must ask this, where do you see any poet-engine logs? I'm confused

MatthewRubino (Mon, 17 Feb 2020 12:12:22 GMT):
last line : `pbft-engine[11772]: ERROR | pbft_engine::engine:`

MatthewRubino (Mon, 17 Feb 2020 12:13:10 GMT):
looking at now i don't :( i guess i just misread it repeatedly... sorry about that

wkatsak (Mon, 17 Feb 2020 17:24:02 GMT):
Hello everyone, I'm working with @jamesbarry and we've been trying to debug some connection issues. I apologize for being out of the loop, I was doing some international travel.

wkatsak (Mon, 17 Feb 2020 17:25:16 GMT):
Maybe james mentioned this already, but we've noticed an issue where if a node goes offline (for whatever reason) and comes back, if it has missed enough blocks, the validator will literally kill its host with OOM.

wkatsak (Mon, 17 Feb 2020 17:26:59 GMT):
We've been running on AWS nodes with 4 GB of RAM, and can reliably cause a machine to run out of memory and lock up hard.

wkatsak (Mon, 17 Feb 2020 17:29:50 GMT):
@amundson , @agunde you were tagged earlier, so I am pulling you in now.

jamesbarry (Mon, 17 Feb 2020 18:08:18 GMT):
One quick note, we have the same issue with 8 and 16 gig instances too, but have run over 4.5 million transactions over the weekend on various testnets, and every time we brought a single node offline, and reattached it, memory surges prior to re-attaching . We are looking at a Jira to put in and would like some comments prior to writing it up. Can we build in a limiter on RAM usage that can be set with the CLI. That way, you do not have boundless RAM usage on re-attachment.

MatthewRubino (Mon, 17 Feb 2020 19:10:22 GMT):
so we had some issues like that before we got the block info fix. so what we experienced, if a node fell more than 5 minutes behind it would hit the block info bug (basically the timestamp was too old so the block was deemed invalid). in this case the node would be stuck at whatever block it left off at (or 0 if starting from scratch). the stuck node would attempt to sync forever and consume seemingly infinite resources. in our case kube would evict it over and over again.

MatthewRubino (Mon, 17 Feb 2020 19:12:17 GMT):
i forget your setup, but if you are not using custom TPs or an old block info then it just sound similar but not the same issue. if you guys are not using block info it could be that your transaction processors are not deterministic. at some point they get to a block they cannot get past and get stuck

wkatsak (Mon, 17 Feb 2020 19:23:40 GMT):
We were able to reproduce it with just `intkey`. We wanted to make sure that it wasn't something with our TP.

wkatsak (Mon, 17 Feb 2020 19:24:42 GMT):
The only "interesting" thing that we are doing is submitting a series of `intkey` transactions, where each one has a dependency on the previous transaction.

wkatsak (Mon, 17 Feb 2020 19:25:29 GMT):
Well to be clear, each node is running a process generating `intkey` transactions, and all submissions from each node are dependent on the previous transaction.

wkatsak (Mon, 17 Feb 2020 19:25:51 GMT):
No dependency between nodes though.

wkatsak (Mon, 17 Feb 2020 19:26:30 GMT):
When it goes down though, it really kills the AWS node. You can't even ssh into it. Our grafana shows the memory peak right before it craps out.

arsulegai (Tue, 18 Feb 2020 04:36:32 GMT):
I remember this issue from the last year, but our tests were on containers so we could control the memory allocated. BTW, were you able to get to the OS error status? Check if IO peak caused because of node being down is able to handle. Memory overrun shouldn't kill the process/node unless there's a leak, peak in IO is expected instead. In our case the last year, it was because consensus engine's error.

arsulegai (Tue, 18 Feb 2020 04:36:32 GMT):
I remember this issue from the last year, but our tests were on containers so we could control the memory allocated. BTW, were you able to get to the OS error status? Check if IO peak caused because of node down on memory is able to handle. Memory overrun shouldn't kill the process/node unless there's a leak, peak in IO is expected instead. In our case the last year, it was because consensus engine's error.

puria (Tue, 18 Feb 2020 14:42:32 GMT):
Has joined the channel.

jamesbarry (Tue, 18 Feb 2020 18:52:49 GMT):
@arsulegai Thanks for this note. @wkatsak and I are convinced that we need to put a limit on how many blocks can be processed at a time to catch up from a diconnected node. We had a node that crashed every time that we tried to reconnect, because if dumped all blocks need to catch up in memory at the same time, thus getting us the out of memory error. We are in the process of trying to recreate on Docker where we control the memory constraints much tighter to see if that helps the issue. But our target market does not use containers. By the way we ran 4.5 million intkey transactions into our blockchain in 30 hours with 5 nodes, before we ran into this. Then the entire chain went down hard. We want to get the correct logs to show why we beleive you need to moderate the number of blocks processed at a time for a node catching up. Also, the network connection manager has issues prior to dropping the node. We want to reproduce that too, for a total of 2 JIRAS. once we get the reproducability.

wkatsak (Wed, 19 Feb 2020 14:53:57 GMT):
@jamesbarry @arsulegai @amundson @agunde So, I have a bit of progress on the memory issue. I stood up a v1.2.4 compose environment with 5 validators and PBFT. Validators were limited to 1GB of RAM in compose config. I left one validator offline, then did a few 100k `intkey` transactions. Sure enough, when I try to bring up the last validator the recovery process eats all available memory, and swaps over 2GB of additional data (seems like it requests all blocks at once). If I try with swap disabled, the validator just crashes hard (which kind of makes sense).

wkatsak (Wed, 19 Feb 2020 14:55:29 GMT):
Our AWS test machines didn't have swap enabled, so maybe this is the cause of our hard crashes. This seems like it should be carefully thought about, however. Does it really makes sense to try to load ALL behind blocks, then process them?

wkatsak (Wed, 19 Feb 2020 14:56:14 GMT):
Incidentally, when I restarted the containers with an 8GB memory limit, it crashed anyway after a while, with this: ``` sawtooth-pbft-engine-default-4 | INFO | pbft_engine::node:39 | (PP, view 1, seq 121): Received f + 1 ViewChange messages; starting early view change sawtooth-pbft-engine-default-4 | INFO | pbft_engine::node:16 | (PP, view 1, seq 121): Starting change to view 59 sawtooth-validator-default-4 | [2020-02-19 14:46:15.930 WARNING (unknown file)] [src/journal/block_validator.rs: 284] Error during block validation: BlockValidationError("During validate_on_chain_rules, error creating settings view: NotFound(\"d5f20720b5c99592c3f2f649b7eb5b082bf952aaa3aea615f25c583823d214ce\")") sawtooth-pbft-engine-default-4 | INFO | pbft_engine::node:48 | (V(59), view 59, seq 121 *): Updated to view 59 sawtooth-validator-default-4 | thread '' panicked at 'No lock holder should have poisoned the lock: "PoisonError { inner: .. }"', src/libcore/result.rs:1188:5 sawtooth-validator-default-4 | fatal runtime error: failed to initiate panic, error 5 sawtooth-pbft-engine-default-4 | INFO | sawtooth_sdk::messag | Received Disconnect sawtooth-pbft-engine-default-4 | DEBUG | sawtooth_sdk::messag | Exited stream sawtooth-pbft-engine-default-4 | ERROR | pbft_engine::engine: | ServiceError: Couldn't initialize block after view change due to: ReceiveError: DisconnectedError sawtooth-pbft-engine-default-4 | DEBUG | zmq:489 | socket dropped sawtooth-pbft-engine-default-4 | DEBUG | zmq:489 | socket dropped ```

wkatsak (Wed, 19 Feb 2020 14:56:50 GMT):
This guy: `Error during block validation: BlockValidationError("During validate_on_chain_rules, error creating settings view: NotFound(\"d5f20720b5c99592c3f2f649b7eb5b082bf952aaa3aea615f25c583823d214ce\")")` was occurring a lot up to the crash.

wkatsak (Wed, 19 Feb 2020 14:57:02 GMT):
Not sure if this is an orthogonal issue, or what

arsulegai (Wed, 19 Feb 2020 15:06:15 GMT):
@wkatsak that's a great finding, it should be noted in sawtooth-website or the document as a caution to the cluster administration.

arsulegai (Wed, 19 Feb 2020 15:07:11 GMT):
One the prior question, there were requests for state checkpointing feature which avoids cycling through all the blocks from the genesis when a node comes up in between.

jamesbarry (Wed, 19 Feb 2020 16:41:01 GMT):
@arsulegai @wkatsak In the next several days we will update this feature request from 2/1/2018 Evidently this is an issue that has been around and needs resolution as it causes a crash that is hard to trace back. More to come in this Jira going forward. SawtoothSTL-972 "Reduce memory consumption on block catchup"

amundson (Wed, 19 Feb 2020 17:12:17 GMT):
@wkatsak os metrics may be misleading because of aggressive caching that lmdb performs, so its important to be very specific about what metrics you are using (and ideally with graphs of the behavior, if you can get the data into that format)

amundson (Wed, 19 Feb 2020 17:17:27 GMT):
before evaluating a block, the completer component within the validator will ensure that the validator has all the previous blocks required before sending teh block to the journal. it does this one block at a time (it does not have enough information to do otherwise with the current APIs between validators), and it is very inefficient for long chains. you will observe this as a period were the validator is doing a lot of block requests but not advancing the chain head.

amundson (Wed, 19 Feb 2020 17:22:01 GMT):
however, IIRC, the blocks are persisted to disk during this process as part of the block manager changes between 1.1 and 1.2, but would have to spend some time diving into it again to verify that

amundson (Wed, 19 Feb 2020 17:25:44 GMT):
that doesn't mean something else isn't eating up memory during that process (or that the use of block manager isn't helping for whatever reason), but there were definitely changes since STL-972 was filed that probably made this better than previously

wkatsak (Wed, 19 Feb 2020 18:27:15 GMT):
@amundson I'd have to look into the details of how lmdb works (does it memory map the file or something? not sure) but this is definitely allocating and using a lot of memory. Like I said, if I disable swap, you can watch the validator process run up to the memory limit (on top), then crash when it cannot allocate any more memory.

amundson (Wed, 19 Feb 2020 20:03:27 GMT):
@wkatsak yes, lmdb uses mmap. I'm not suggesting there isn't a problem, just sensitive to the difficulty in diagnosis/resolution. one of the guys that knows a lot in this area is back from vacation next week. we should be able to get quite a bit of data out of the process. it's a bit more difficult because of the mix of python and rust but its doable.

wkatsak (Wed, 19 Feb 2020 21:17:39 GMT):
I've been looking at connection management code...the python/rust boundary is also painful there...

wkatsak (Wed, 19 Feb 2020 22:10:23 GMT):
@jamesbarry @arsulegai @amundson Another interesting result. I've set up a similar network to the last example (docker compose, 5 nodes, pbft). In this case, I am running an additional 5 containers, each of them generating a stream of intkey transactions to a particular validator (with each transaction dependent on the previous one). Once this was running, I stopped and restarted a randomly chosen validator a couple times, and was able to get the network into a state where it wasn't taking any more transactions, with lots of messages like this: ``` sawtooth-validator-default-0 | [2020-02-19 22:09:52.818 DEBUG ffi] [src/journal/candidate_block.rs: 165] Transaction rejected due to missing dependency, transaction 15f0e987b12fda21d15884f165c3d383f42a04c6ebb41f0cbac78e5a00fc4e711b57bb17073ff0808f8838b3191a2ae2d3c4f713e6fa723685bb2b68cbc7721f depends on 54409191748a82f0fd17254836502e974a61f5099119cab4417337500910817f7d7723cf032a6b0db430c1140e3fb1f40bd72175466d004ada273c85e4449fc3 ```

wkatsak (Wed, 19 Feb 2020 22:12:18 GMT):
To try to automate, I brought in `https://github.com/alexei-led/pumba` to my compose config. This is a package to introduce random failures to docker configurations. I configured it to periodically pick a random validator, and pause the container for a bit, then let it resume. This took a little longer, but the network eventually converted into a similar state.

wkatsak (Wed, 19 Feb 2020 22:12:54 GMT):
The accept queues are all blocked up, and return 429s.

wkatsak (Wed, 19 Feb 2020 22:13:06 GMT):
converged* not converted

jsmitchell (Wed, 19 Feb 2020 22:28:53 GMT):
lmdb uses linux fscache pages, which will not result in oom. Those pages will be the first to go when process memory needs to be allocated. The memory utilization on catchup is most likely due to the large number of blocks held in the completer. As @amundson mentioned, some block manager changes may have addressed portions of this, but it is likely that something is holding onto these references, instead of persisting them to disk and reloading later. The sawtooth catchup process could be substantially improved with the addition of two features: 1) a negotiation between the behind node and its peer on most recent common block height and then a protocol which sends those blocks sequentially as they are requested and applied by the behind node. 2) a state checkpointing/transfer mechanism.

jsmitchell (Wed, 19 Feb 2020 22:31:50 GMT):
#1 should be fairly easy and should address the issue for reasonable catchups. It would at a minimum operate with known memory constraints. #2 is significantly more complex and would solve the issue for arbitrary depth catchups with node bring up times related to the size of the database checkpoint.

jsmitchell (Wed, 19 Feb 2020 22:32:48 GMT):
Future more exotic possibilities involve partial state transfer with branch merkle hashes where the state is transferred incrementally as needed to validate incoming blocks or answer queries.

arsulegai (Thu, 20 Feb 2020 00:46:55 GMT):
@wkatsak that's correct. A solution was discussed for this issue where periodically the pending queue is cleared making space for new messages. It's interesting though, if its it's a random failure of upto 2 nodes then it should be able to process incoming transaction and expanding pending queue size. If you're failing validator much before it could catchup with others often then I do not know of a way to avoid 429. But in either case, you should see eventually 429 getting away?

arsulegai (Thu, 20 Feb 2020 00:57:49 GMT):
@jsmitchell both system configuration and the application data are put in the same global state. Would that make it difficult to get towards proposal # 1 you mentioned? Are you suggesting that the new node query data from others as and when there's a request from user for that? Maybe the new node has broken links between the most recent block from where it is catching up to the current head. It'll never try to fill in the gap unless requested to do so?

jsmitchell (Thu, 20 Feb 2020 01:27:21 GMT):
@arsulegai no, it’s just a matter of changing the catchup protocol. State marches in concert with block application. If a node is 1000 blocks back, it needs a strategy for catchup — currently that process is recursive from the current head of the network. A better approach would be iteration in chunks based on shared most recent common block between nodes.

jsmitchell (Thu, 20 Feb 2020 01:28:05 GMT):
The partial thing is a future exotic capability.

arsulegai (Thu, 20 Feb 2020 01:36:26 GMT):
How about that protocol negotiation take inference from the consensus engine attached to it?

arsulegai (Thu, 20 Feb 2020 01:38:04 GMT):
For example, PBFT/RAFT would never fork. If majority of nodes have common block. Accept it blindly without a proof from the past. But global state sync is still a big problem to solve.

arsulegai (Thu, 20 Feb 2020 01:43:52 GMT):
Copying the global state from one of the current nodes to the new machine solves that concern? Don't know how good that is practically.

RajaramKannan (Thu, 20 Feb 2020 05:05:13 GMT):
@arsulegai I believe and I have posted here and elsewhere a similar issue in catchup - this is with 1.1.5 and PBFT 1.0.1. With just 886 blocks to catchup, the node was getting stuck at 255 (in other nodes, I could see an attempt to republish 255 every now and then even though the current head was at 886 blocks), so I presume that was part of the problem (and yet unexplained why it was happening). (the node that was out of sync was part of PBFT and it is a small network so we were unable to publish any new blocks). In any case, we were able to bring back the entire network up and running, by copying over the global state, blocks etc (the entire _data folder if you will) to the new node. Learnt a lot in the process because during our trial and error process we ended up bricking the network so we actually ended up doing this for all nodes in the network and bringing them up simultaneously. We noticed that in this process we had to restart the validator/pbft a few times in each nodes since some of them were stalling - but with a few restarts (we were using the PBFT logs and the view numbers to guide us in ensuring all nodes were at the same view number eventually). So copying the global state over does work... Not ideal - we use generated custom events to catchup an external DB - but we now have a recovery utility that reads each block and resends the events manually! - but works.

arsulegai (Thu, 20 Feb 2020 05:09:41 GMT):
Interesting and nice findings! Would it be possible to open source the utility for others benefit?

RajaramKannan (Thu, 20 Feb 2020 05:26:42 GMT):
@arsulegai we are still working on the utility (we used an in progress version since it was an emergency), but happy to open source it once we have it fully working. Just in case I forget, you can hold me to it

RajaramKannan (Thu, 20 Feb 2020 08:09:17 GMT):
question to the team - noticed but ignored this earlier, but once again today we noticed it. We have a PBFT setup with 4 nodes and setup with static peering in the respective validator.toml file. Node 1, Node2 (peers: Node1 IP), Node3 (peers: Node 2, Node 3 IPs), Node 4 (peers:Node 2, 3, 4)``` Today Node 3 went down (more accurately the containers within). When we brought Node 3 back up, it connected with Node 1 and Node 2 (checked using the /peers api call). But Node 3- Node 4 did not connect. We had to then restart Node4 to connect! (Validator version 1.1.5). Is this a known issue? ```

arsulegai (Thu, 20 Feb 2020 08:43:16 GMT):
How long did you wait for Node 4 when you found it's not listing Node 3? Node 4 has to initiate a connection to Node 3. In this case it has 1 failed connection, 2 successful connections active. Eventually it should have caught up with Node 3 as well. But you can also force minimum peer connectivity setting to 3 for the Node 4.

RajaramKannan (Thu, 20 Feb 2020 08:51:18 GMT):
we do have minimum peer connectivity settings set to 3. We waited several minutes (say 10-15 - but I wasnt timing so cant be sure.). Would Node 4 try to reconnect? How long would it typically take? (or perhaps, because The Node was down for an entire day, so I wonder if in the connection attempts there is some logic that escalates the timeout each time the attempt fails?)

Michael8086 (Thu, 20 Feb 2020 16:28:05 GMT):
Has joined the channel.

ParitoshPandey (Thu, 20 Feb 2020 18:07:41 GMT):
Has joined the channel.

jamesbarry (Thu, 20 Feb 2020 18:12:46 GMT):

Clipboard - February 20, 2020 11:12 AM

jamesbarry (Thu, 20 Feb 2020 18:12:51 GMT):
@RajaramKannan We have a similar issue. But when our node auto restarts, it floods transaction onto the node until it runs out of memory and the AWS instance reboots itself and tries to reconnect, where is gets the transactions and restarts yet again. I am not sure if that is your problem or not. It is simply no gate on the number of transactions to catch up on, and if the node has too many transactions it is behind and that number is greater than available memory, you will continually restart that node.

jsmitchell (Thu, 20 Feb 2020 18:32:12 GMT):
@jamesbarry see my comment above

jsmitchell (Thu, 20 Feb 2020 18:33:41 GMT):
Design discussions/PRs for tackling those feature additions would be very welcome

jamesbarry (Thu, 20 Feb 2020 19:50:47 GMT):
I updated this old Sawtooth feature request in Jira as evidently this has been on the plate for a while. We will update with a better plan to implement over the next few days. "STL-972 Reduce memory consumption on block catchup"

jamesbarry (Thu, 20 Feb 2020 19:50:47 GMT):
@jsmitchell @RajaramKannan I updated this old Sawtooth feature request in Jira as evidently this has been on the plate for a while. We will update with a better plan to implement over the next few days. "STL-972 Reduce memory consumption on block catchup"

RajaramKannan (Fri, 21 Feb 2020 09:00:18 GMT):
@jamesbarry we havent hit the memory issue yet, our deployments are via docker and typically we have used reasonbly sized EC2 instances. Will keep a watch if see similar (we dont have the same number of transaction volumes yet that you have generated perhaps).

RajaramKannan (Fri, 21 Feb 2020 09:02:15 GMT):
we are however seeing the catchup issue in general as my colleague @ParitoshPandey posted in the other channel

jamesbarry (Mon, 24 Feb 2020 03:52:29 GMT):
I posted Jira issue STL-1700 https://jira.hyperledger.org/browse/STL-1700 tonight. The issue we have had on block's catching up in a validator is a memory-swap issue. We detailed how to recreate the issue and why you don't see it in a memory swap enabled Docker.

RajaramKannan (Tue, 25 Feb 2020 07:41:56 GMT):
@jamesbarry as you point out we may be facing this issue as well as posted by @ParitoshPandey . However the node trying to catchup has just ~400 blocks. It is running on a partner's infrastructure so we dont have direct access to it making it harder to troubleshoot. They are using the docker compose files provided by us and we havent specifically setup any memory swap. Just so we are able to check if it is related, when you mention above *_"if the node has too many transactions it is behind and that number is greater than available memory, you will continually restart that node." _*, is there some way to work out an equation that shows the transaction size or number to the available memory that prevents the node from catching up? (I am making an assumption, probably incorrect, that it may not be just the number of transactions but perhaps also what it writes to the state/custom events etc?)

MatthewRubino (Tue, 25 Feb 2020 15:26:44 GMT):
we definitely hit this issue. we are running ing EKS so with kube I do not think you can even define a swap. nodes would sync very slowly, get evicted, and restart

MatthewRubino (Tue, 25 Feb 2020 15:26:44 GMT):
we definitely hit this issue. we are running in EKS so with kube I do not think you can even define a swap. nodes would sync very slowly, get evicted, and restart

mzins_dev (Tue, 25 Feb 2020 20:55:38 GMT):
Has joined the channel.

wkatsak (Wed, 26 Feb 2020 00:58:11 GMT):
@RajaramKannan @MatthewRubino Docker will usually utilize swap by itself if the node is configured for swapping. AWS Linux nodes are not by default, as it turns out.

wkatsak (Wed, 26 Feb 2020 00:58:33 GMT):
I don't know about how much memory would exactly be required, you could probably extrapolate it by looking at the growth over time.

MatthewRubino (Wed, 26 Feb 2020 01:12:39 GMT):
yes it will, however (my understanding is) kubernetes will still determine a process as a hog regardless of any swap configured and evict it.

jsmitchell (Wed, 26 Feb 2020 01:21:18 GMT):
This swap behavior is not desirable. A feature to improve the catchup protocol would allow the receiving node to decide how much it wanted to buffer as it was performing validations.

amundson (Wed, 26 Feb 2020 15:53:08 GMT):
@jsmitchell I don't think a special-case catch-up mode is necessary (or a good idea). So catch-up is probably the wrong word here, it's really about efficiently processing chain heads with a much greater block height than the current chain. If we were only handling consensus that had finality, then you could see this purely as a catch-up exercise, but with forking consensus it's not a special case, it's the norm, and the efficiency issue happens when the delta is large. (Point is, it's not a special mode.)

amundson (Wed, 26 Feb 2020 15:56:21 GMT):
The recursive nature of the requests is (requesting blocks in reverse order until we have the complete set) is not super problematic, except that a) we seem to be holding onto too much in memory, which could be solved by making sure it gets offloaded to storage; b) for really long deltas, the amount of time to get all the blocks can be a problem because no work is getting done while we get all the blocks.

amundson (Wed, 26 Feb 2020 16:00:19 GMT):
Solving for (b) is difficult for the forking consensus case, which is why we have the recursive approach we have now. Essentially, any efficient solution requires one validator to ask for blocks going forward from a point at which they both share history. The difficulty is in determining that historical shared point efficiently and then adding code to do that negotiation. Once that was solved, we could add a get_blocks(chain_head, starting_height, count) to incrementally get blocks in forward order.

amundson (Wed, 26 Feb 2020 16:04:04 GMT):
However, we also would need to change our approach to handling off those blocks to the journal so that we don't just hold onto them in the completer. Today, when we get a block, we consider it a chain head, get the complete chain up to that point, then ask the journal to consider whether the chain head is better than what we have. Switching that to a more incremental approach requires a bit more journal+completer work. It also assumes that for any given chain head, at every block height, it is the preferred chain (which I'm pretty sure we said isn't a valid assumption in some cases of forking consensus, though that isn't clear).

amundson (Wed, 26 Feb 2020 16:05:18 GMT):
I feel like we should be able to have a hint to the completer/journal that allows us to do the right thing when using consensus engines that have finality, because 90% of the complexity doesn't apply in those cases.

amundson (Wed, 26 Feb 2020 16:06:19 GMT):
Solving (a) should be fairly straight-forward in comparison.

amundson (Wed, 26 Feb 2020 16:11:25 GMT):
For example, when we have finality, we know the point of shared history (it will be the current chain head), and so we can immediately request the next block. It's also completely safe to immediately start applying that block (there is no ambiguity around correctness). The only thing that maybe would need to be done is to make sure that the journal intelligently orders its processing of work, so it's not confused by a fast influx of blocks in a forward manner. I don't recall if it's already smart about this or not, I know we discussed duplicate-work-avoidance previously.

wkatsak (Wed, 26 Feb 2020 17:29:25 GMT):
@amundson @jsmitchell I agree that the swapping is not desired behavior. At the very least, it slows down the process. I only proposed this as a workaround until we can figure out what the right way forward is.

wkatsak (Wed, 26 Feb 2020 17:30:38 GMT):
I haven't had time to dive into this code, but I've gleaned from discussions that what the system is supposed to do is get the blocks and put them into local storage, then do the validation. Is it possible that some code is simply not dropping the references to the blocks, and that is why they are staying in memory?

jamesbarry (Thu, 27 Feb 2020 04:06:29 GMT):
@amundson We have two forking consensus's today POET (elect new leaders causing the fork) and PBFT ( changing the primary node causing the fork) and one that is non-forking, RAFT. If anyone is using the non-forking RAFT we should allow for non-forking consensus's to have a special path when selected, so that can be solved quickly. We can keep the forking consensus with its own code that can be updated as additioanl consensus mechanisms are added to address different corporate workloads. When adding consensus mechanisms they can go down the most efficient path. (Like Proof of Authority which is non forking) That might make it easier down the road as more consensus mechanisms are added.

RajaramKannan (Thu, 27 Feb 2020 04:39:04 GMT):
@jamesbarry just for my understanding when you say PBFT is forking, my understanding was that the final commit phase will always result in a non-forking behavior (no node will have in their state/blocks a fork). It is only during the earlier stages (and specifically when the primary node changes and therefore the next node proposes potentially a different block for processing), that there is forking that gets resolved before it gets committed. (Unlike in POET where it might commit the fork and resolve later?)

Ashish_ydv (Thu, 27 Feb 2020 07:48:00 GMT):
Has joined the channel.

amundson (Thu, 27 Feb 2020 14:44:57 GMT):
@jamesbarry It might be useful to first agree on what we mean by a fork -- in the context of Sawtooth, we mean that, for the same block height, different nodes have committed different blocks to the chain. This is a property of nakamoto-style consensus like PoW or PoET. This is because not every node in the network immediately sees all blocks, and so the decisions are not perfect. Eventually, the nodes will find adopt the winning block once they receive it, replacing the older block which was previously applied. We often call that process fork resolution. The result is that blocks can essentially end up uncommitted from the chain. This is very complex both from a journal perspective and from a client perspective (clients need some sense that state may completely shift, like using slowly changing dimensions in database materialization of state, etc.). This situation doesn't occur with PBFT or Raft.

amundson (Thu, 27 Feb 2020 14:47:32 GMT):
@wkatsak seems possible, needs more investigation

jamesbarry (Thu, 27 Feb 2020 21:14:33 GMT):
@amundson Thank you for the clarification. I was reading through the docs late last night, and got mixed up. I agree that forking does not occur with PBFT or RAFT. That being said, is it possible that upon loading PBFT, RAFT or any non-forking consensus mechanism's that there is a separate mechanism to solve (a) from above. PoET itself seems to be in a standstill on development of a PoET 2.0. Perhaps the path upon load could then keep PoET and other forking mechanisms on one way of merging in a new or returning node, while non-forking consensus would be handled another way. I believe that having a pluggable consensus is a good idea, not all mechanisms can be handled the same way. We can’t simply plug them in without taking into account various ways these mechanisms handle valid blocks. If we end up with a staking consensus, that is even more radical. It might mean that we need coarse grain segmentation-consensus paths upon load that will segment type of consensus properly. I am not in favor of a one-size-fits-all approach to pluggable consensus, as we have seen, use cases can change approaches dramatically which makes coding even harder. For a consensus to be sued, we might need a metadata definition header as the consensus is loaded to determine how blocks and nodes are handled. Just my quick thoughts.

jamesbarry (Thu, 27 Feb 2020 21:14:33 GMT):
@amundson Thank you for the clarification. I was reading through the docs late last night, and got mixed up. I agree that forking does not occur with PBFT or RAFT. That being said, is it possible that upon loading PBFT, RAFT or any non-forking consensus mechanism's that there is a separate mechanism (path) to solve (a) from above. PoET itself seems to be in a standstill on development of a PoET 2.0. Perhaps the path upon load could then keep PoET and other forking mechanisms on one way of merging in a new or returning node, while non-forking consensus would be handled another way. While I believe that having a pluggable consensus is a good idea, not all mechanisms can be handled the same way. We can’t simply plug them in without taking into account various ways these mechanisms handle valid blocks. For example, If we end up with a building for a staking consensus, that can be an even more radical design. It might mean that we need coarse grain segmentation-consensus paths upon load that will segment type of consensus properly. I am not in favor of a one-size-fits-all approach to pluggable consensus, as we have seen, use cases can change approaches dramatically which makes coding even harder. For a consensus to be used, we might need a metadata definition header as the consensus is loaded to determine how blocks and nodes are handled. Just my quick thoughts.

UdayBollineni (Fri, 28 Feb 2020 10:03:15 GMT):
Has joined the channel.

UdayBollineni (Fri, 28 Feb 2020 10:03:16 GMT):
Hi all, I am facing an issue in the validator. I have created a jira ticket for that. https://jira.hyperledger.org/browse/STL-1701

agunde (Fri, 28 Feb 2020 14:00:33 GMT):
@UdayBollineni Hi, can you provide more information about when the error happened. For example was this on start up? After it had been running for a while? Any other errors or events happen around it?

wkatsak (Fri, 28 Feb 2020 15:51:50 GMT):
@UdayBollineni @agunde Please check your peer list. It is possible that you have an accidental empty string `''` listed as a peer. This can generate the ZMQ error specified.

wkatsak (Fri, 28 Feb 2020 15:56:44 GMT):
Can you post your `validator.yaml`

jsmitchell (Fri, 28 Feb 2020 16:00:51 GMT):
I think there are a couple of things getting mixed up here @jamesbarry @wkatsak @amundson , and it would be helpful to keep them separate.

jsmitchell (Fri, 28 Feb 2020 16:03:23 GMT):
@jamesbarry it sounds like you are talking about different consensus approaches and whether the existing interface or an enhanced interface would be capable of covering a spectrum of consensus models. We are very interested in that conversation, specifically regarding enhancements needed or detailed examples of difficulties with plugging novel consensus into such an interface. I propose we have this discussion on the #sawtooth-consensus-dev channel

jsmitchell (Fri, 28 Feb 2020 16:05:05 GMT):
That is a separate issue from the catch up discussion. The concept of catchup being sensitive to forking/non-forking consensus was brought up - let me attempt to explain why that’s not the case.

jsmitchell (Fri, 28 Feb 2020 16:17:11 GMT):
Currently, when a node makes a peer connection, it requests that peer’s chain head. The intent is the remote node shares the newest block it has determined is valid and has committed to its chain. When the node joining the network receives it, it determines if it has the required dependencies to validate it, which includes the parent block. It continues to request, parse, and request the parent blocks until it the dependencies are satisfied or until it reaches the first block in the chain (where the parent is a special null block identifier). This process has some significant drawbacks - it carries high latency since each request needs to go through the receive parse request loop, it delays the start of validation until the entire dependency chain is unzipped, and it requires an unbounded amount of storage (currently in ram) during operation. At the end of the day, however, this is just the simple transfer of a linked list of blocks between the “most recent shared common block” and the remote peer’s chain head. There are no fork blocks involved or transferred. While this lengthy process is happening, new blocks are being published and being received by the new node as well, and they go through this same completion process. Regardless of whether the consensus mechanism forks or not, those blocks will be received and their dependencies will be requested as above. This will likely have the effect of short branch like structures being added to the end of the linked block list that is being accumulated by the completer. If the remote peer’s chain head is on a discarded fork, the same thing will correct it - new blocks received from other nodes will be dependency resolved to a somewhat earlier point in the chain being requested.

jsmitchell (Fri, 28 Feb 2020 16:17:11 GMT):
Currently, when a node makes a peer connection, it requests that peer’s chain head. The intent is the remote node shares the newest block it has determined is valid and has committed to its chain. When the node joining the network receives it, it determines if it has the required dependencies to validate it, which includes the parent block. It continues to request, parse, and request the parent blocks until either the dependencies are satisfied or until it reaches the first block in the chain (where the parent is a special null block identifier). This process has some significant drawbacks - it carries high latency since each request needs to go through the receive parse request loop, it delays the start of validation until the entire dependency chain is unzipped, and it requires an unbounded amount of storage (currently in ram) during operation. At the end of the day, however, this is just the simple transfer of a linked list of blocks between the “most recent shared common block” and the remote peer’s chain head. There are no fork blocks involved or transferred. While this lengthy process is happening, new blocks are being published and being received by the new node as well, and they go through this same completion process. Regardless of whether the consensus mechanism forks or not, those blocks will be received and their dependencies will be requested as above. This will likely have the effect of short branch like structures being added to the end of the linked block list that is being accumulated by the completer. If the remote peer’s chain head is on a discarded fork, the same thing will correct it - new blocks received from other nodes will be dependency resolved to a somewhat earlier point in the chain being requested.

jsmitchell (Fri, 28 Feb 2020 16:17:11 GMT):
Currently, when a node makes a peer connection, it requests that peer’s chain head. The intent is the remote node shares the newest block it has determined is valid and has committed to its chain. When the node joining the network receives it, it determines if it has the required dependencies to validate it, which includes the parent block. It continues to request, parse, and request the parent blocks until either the dependencies are satisfied or until it reaches the first block in the chain (where the parent is a special null block identifier). This process has some significant drawbacks - it carries high latency since each request needs to go through the receive parse request loop, it delays the start of validation until the entire dependency chain is unzipped, and it requires an unbounded amount of storage (currently in ram) during operation. At the end of the day, however, this is just the simple transfer of a linked list of blocks between the “most recent shared common block” and the remote peer’s chain head. There are no fork blocks involved or transferred. While this lengthy process is happening, new blocks are being published and being received by the new node as well, and they go through this same completion process. Regardless of whether the consensus mechanism forks or not, those blocks will be received and their dependencies will be requested as above. This will likely have the effect of short branch like structures being added to the end of the linked block list that is being accumulated by the completer. If the remote peer’s chain head is on a discarded fork, the same thing will correct it - new blocks received from other nodes will be dependency resolved to a somewhat earlier point in the chain being requested.

jsmitchell (Fri, 28 Feb 2020 16:17:11 GMT):
Currently, when a node makes a peer connection, it requests that peer’s chain head. The intent is the remote node shares the newest block it has determined is valid and has committed to its chain. When the node joining the network receives it, it determines if it has the required dependencies to validate it, which includes the parent block. It continues to request, parse, and request the parent blocks until either the dependencies are satisfied or until it reaches the first block in the chain (where the parent is a special null block identifier). This process has some significant drawbacks - it carries high latency since each request needs to go through the receive parse request loop, it delays the start of validation until the entire dependency chain is unzipped, and it requires an unbounded amount of storage (currently in ram) during operation. At the end of the day, however, this is just the simple transfer of a linked list of blocks between the “most recent shared common block” and the remote peer’s chain head. There are no fork blocks involved or transferred. While this lengthy process is happening, new blocks are being published and being received by the new node as well, and they go through this same completion process. Regardless of whether the consensus mechanism forks or not, those blocks will be received and their dependencies will be requested as above. This will likely have the effect of short branch like structures being added to the end of the linked block list that is being accumulated by the completer. If the remote peer’s chain head is on a discarded fork, the same thing will correct it - new blocks received from other nodes will be dependency resolved to a somewhat earlier point in the chain being requested.

jsmitchell (Fri, 28 Feb 2020 16:23:06 GMT):
So, none of this is a “mode” and none of it is relevant to how the consensus algorithm considers forks. My proposal above was to enhance the block transfer request process to allow negotiation for “most recent common shared block” to some arbitrary future block in an ordered list (likely the remote node’s chain head). This would allow the blocks to be transferred in order avoiding the latency of the recursive request parse request loop, and allowing the first blocks to begin validation immediately upon receipt. The existing completer functionality would remain intact. The slow existing process would begin requesting dependencies backward from newly received blocks, while the enhanced transfer process would be racing from the beginning. At some point (probably fairly close to the original chain head) they would “meet up” and the completer could stop recursively requesting parents.

wkatsak (Fri, 28 Feb 2020 17:00:44 GMT):
@jsmitchell This sounds like a good solution to the issue. Is there anything that we can do to help make it happen?

jsmitchell (Fri, 28 Feb 2020 17:00:55 GMT):
submit a PR :)

jsmitchell (Fri, 28 Feb 2020 17:01:53 GMT):
(as an aside, I also think it would be a good idea to make the completer's behavior bounded for memory usage with a backing store)

amundson (Fri, 28 Feb 2020 20:25:56 GMT):
@jsmitchell the difference is that, when you have finality, you can very easily immediately request the blocks in a forward order without negotiating the information about a common ancestor -- block height is enough

amundson (Fri, 28 Feb 2020 20:35:11 GMT):
@jamesbarry the current approach is to let consensus "drive" -- which means, it is telling journal (chain controller really) what to do and when to do it. this greatly reduces the need for journal to know anything about consensus. this is actually a great approach and it works really well architecturally; but, if we extend that idea to allow for pluggable chain controllers, we can implement chain controllers that have less features (which is why its interesting to me) and we can implement chain controllers that don't necessarily have to have a single chain (also interesting). splinter's scabbard implementation could mostly be thought of a chain controller with no chain (ok, we need to rename chain controller probably).

jsmitchell (Fri, 28 Feb 2020 20:36:28 GMT):
@amundson sure, but I don't think a binary search on a set of block signatures is onerous. It's going to be a list of length log2 of the search space

amundson (Fri, 28 Feb 2020 20:37:11 GMT):
it's a lot of network traffic and async opportunity for error between the nodes

jsmitchell (Fri, 28 Feb 2020 20:37:42 GMT):
In the usual case, the most recent block the connecting client will have is either a block in the peer's history or the null block identifier (starting from scratch)

jsmitchell (Fri, 28 Feb 2020 20:38:10 GMT):
it's way way less network traffic than what's currently happening

amundson (Fri, 28 Feb 2020 20:39:04 GMT):
yeah, my point is that its only strictly necessary to do anything of that sort when you have forking consensus, so it should only happen if you are using forking consensus

amundson (Fri, 28 Feb 2020 20:39:19 GMT):
could noop when you have finality

jsmitchell (Fri, 28 Feb 2020 20:39:48 GMT):
right, but that can fall out of the protocol - the existence check would be O(1) with a finality based consensus (assuming you were connecting to a network with shared history)

amundson (Fri, 28 Feb 2020 20:40:26 GMT):
GetBlocks(head, start_height, count) -- where you know start_height automatically when you have finality, and when you don't, you use the negotiation protocol to determine start_height between the nodes.

jsmitchell (Fri, 28 Feb 2020 20:40:40 GMT):
something which unravels your history to genesis block should probably be an error to the admin

jsmitchell (Fri, 28 Feb 2020 20:40:49 GMT):
if you have history

jsmitchell (Fri, 28 Feb 2020 20:41:06 GMT):
because it means that the network is completely different than the one you were previously attached to

jamesbarry (Fri, 28 Feb 2020 20:45:19 GMT):
@jsmitchell I agree that we should tske this over to the #sawtooth-consensus-dev channel. @wkatsak and I would like to see where we can help, as we are not that familiar with the core code & structure. The less the plugged in consensus needs to know about the block storage the better. But in the end there are differnt approaches to broad paths of consensus, unless the validator node only confirms a transaction and discards memory of that transaction once a full node has committed to the chain. Hedera Hashgraph thinks that is the answer, as it keeps zero history except for mirror nodes that have the chain database. Perhaps we discuss in the consensus channel.

jamesbarry (Fri, 28 Feb 2020 20:45:19 GMT):
@jsmitchell I agree that we should tske this over to the #sawtooth-consensus-dev channel. @wkatsak and I would like to see where we can help, as we are not that familiar with the core code & structure. The less the plugged in consensus needs to know about the block storage the better. But in the end there are different approaches to broad paths of consensus, unless the validator node only confirms a transaction and discards memory of that transaction once a full node has committed to the chain. Hedera Hashgraph thinks that is the answer, as it keeps zero history except for mirror nodes that have the chain database. Perhaps we discuss in the consensus channel.

UdayBollineni (Tue, 03 Mar 2020 09:34:59 GMT):
@agunde .This error happened after a while. No other errors are there with it.

UdayBollineni (Tue, 03 Mar 2020 09:36:16 GMT):
Validator files are attached in the ticket. I am also attaching validator yaml file below.

UdayBollineni (Tue, 03 Mar 2020 09:37:34 GMT):
apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "7" creationTimestamp: "2019-12-11T12:15:20Z" generation: 15 labels: name: sawtooth-0 name: sawtooth-0 namespace: default resourceVersion: "36438859" selfLink: /apis/apps/v1/namespaces/default/deployments/sawtooth-0 uid: e685b828-1c0f-11ea-a58d-42010a8a0160 spec: progressDeadlineSeconds: 2147483647 replicas: 1 revisionHistoryLimit: 2147483647 selector: matchLabels: name: sawtooth-0 strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 1 type: RollingUpdate template: metadata: creationTimestamp: null labels: name: sawtooth-0 spec: containers: - args: - -c - pbft-engine -vv --connect tcp://sawtooth-0:5050 command: - bash image: gcr.io/beriblock-219722/sawtooth-pbft-engine:nightly imagePullPolicy: IfNotPresent name: sawtooth-pbft-engine resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File - args: - -c - transaction-processor -v --connect tcp://sawtooth-0:4004 command: - bash envFrom: - secretRef: name: tp-secret image: gcr.io/beriblock-219722/processor:upgrade-v0.5 imagePullPolicy: IfNotPresent name: sawtooth-transaction-processor resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /project/sawtooth-tuna name: tp-pvc-claim0 - args: - -c - | if [ ! -f /var/lib/sawtooth/block-00.lmdb ]; then echo Running validator startup script echo $pbft0priv > /etc/sawtooth/keys/validator.priv echo $pbft0pub > /etc/sawtooth/keys/validator.pub sawtooth keygen my_key sawset genesis -k /root/.sawtooth/keys/my_key.priv -o config-genesis.batch sleep 30 echo sawtooth.consensus.pbft.members=["\"$pbft0pub\",\"$pbft1pub\",\"$pbft2pub\",\"$pbft3pub\""] sawset proposal create \ -k /root/.sawtooth/keys/my_key.priv \ sawtooth.consensus.algorithm.name=pbft \ sawtooth.consensus.algorithm.version=1.0\ sawtooth.consensus.pbft.members=["\"$pbft0pub\",\"$pbft1pub\",\"$pbft2pub\",\"$pbft3pub\""] \ sawtooth.publisher.max_batches_per_block=1200 \ sawtooth.identity.allowed_keys=$(cat /etc/sawtooth/keys/validator.pub) \ -o config.batch sawadm genesis config-genesis.batch config.batch fi && sawtooth-validator -vv \ --endpoint tcp://34.82.73.240:8800 \ --bind component:tcp://eth0:4004 \ --bind consensus:tcp://eth0:5050 \ --bind network:tcp://eth0:8800 \ --scheduler parallel \ --peering static \ --maximum-peer-connectivity 10000 command: - bash envFrom: - configMapRef: name: keys-config - secretRef: name: keys-secrets image: gcr.io/beriblock-219722/sawtooth-validator:chime imagePullPolicy: IfNotPresent name: sawtooth-validator ports: - containerPort: 4004 name: tp protocol: TCP - containerPort: 5050 name: consensus protocol: TCP - containerPort: 8800 name: validators protocol: TCP resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/sawtooth/keys name: validator-claim0 - mountPath: /var/lib/sawtooth name: validator-claim1 dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 volumes: - name: validator-claim0 persistentVolumeClaim: claimName: validator-claim0 - name: validator-claim1 persistentVolumeClaim: claimName: validator-claim1 - name: tp-pvc-claim0 persistentVolumeClaim: claimName: tp-pvc-claim0 status: availableReplicas: 1 conditions: - lastTransitionTime: "2019-12-11T12:15:20Z" lastUpdateTime: "2019-12-11T12:15:20Z" message: Deployment has minimum availability. reason: MinimumReplicasAvailable status: "True" type: Available observedGeneration: 15 readyReplicas: 1 replicas: 1 updatedReplicas: 1

UdayBollineni (Tue, 03 Mar 2020 09:39:28 GMT):
apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "4" creationTimestamp: "2019-12-11T12:15:20Z" generation: 8 labels: name: sawtooth-1 name: sawtooth-1 namespace: default resourceVersion: "36439162" selfLink: /apis/apps/v1/namespaces/default/deployments/sawtooth-1 uid: e6b4b962-1c0f-11ea-a58d-42010a8a0160 spec: progressDeadlineSeconds: 2147483647 replicas: 1 revisionHistoryLimit: 2147483647 selector: matchLabels: name: sawtooth-1 strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 1 type: RollingUpdate template: metadata: creationTimestamp: null labels: name: sawtooth-1 spec: containers: - args: - -c - pbft-engine -vv --connect tcp://sawtooth-1:5050 command: - bash image: gcr.io/beriblock-219722/sawtooth-pbft-engine:nightly imagePullPolicy: IfNotPresent name: sawtooth-pbft-engine resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File - args: - -c - transaction-processor -v --connect tcp://sawtooth-1:4004 command: - bash envFrom: - secretRef: name: tp-secret image: gcr.io/beriblock-219722/processor:upgrade-v0.5 imagePullPolicy: IfNotPresent name: sawtooth-transaction-processor resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /project/sawtooth-tuna name: tp-pvc-claim1 - args: - -c - | if [ ! -e /etc/sawtooth/keys/validator.priv ]; then echo $pbft1priv > /etc/sawtooth/keys/validator.priv echo $pbft1pub > /etc/sawtooth/keys/validator.pub fi && sawtooth keygen my_key && sawtooth-validator -vv \ --endpoint tcp://35.203.133.49:8800 \ --bind component:tcp://eth0:4004 \ --bind consensus:tcp://eth0:5050 \ --bind network:tcp://eth0:8800 \ --scheduler parallel \ --peering static \ --maximum-peer-connectivity 10000 \ --peers tcp://34.82.73.240:8800 command: - bash envFrom: - configMapRef: name: keys-config - secretRef: name: keys-secrets image: gcr.io/beriblock-219722/sawtooth-validator:chime imagePullPolicy: IfNotPresent name: sawtooth-validator ports: - containerPort: 4004 name: tp protocol: TCP - containerPort: 5050 name: consensus protocol: TCP - containerPort: 8800 name: validators protocol: TCP resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/sawtooth/keys name: validator-claim2 - mountPath: /var/lib/sawtooth name: validator-claim3 dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 volumes: - name: validator-claim2 persistentVolumeClaim: claimName: validator-claim2 - name: validator-claim3 persistentVolumeClaim: claimName: validator-claim3 - name: tp-pvc-claim1 persistentVolumeClaim: claimName: tp-pvc-claim1 status: availableReplicas: 1 conditions: - lastTransitionTime: "2019-12-11T12:15:20Z" lastUpdateTime: "2019-12-11T12:15:20Z" message: Deployment has minimum availability. reason: MinimumReplicasAvailable status: "True" type: Available observedGeneration: 8 readyReplicas: 1 replicas: 1 updatedReplicas: 1

UdayBollineni (Tue, 03 Mar 2020 09:40:22 GMT):
These configuration files are for two validators.

Dan (Wed, 04 Mar 2020 22:12:07 GMT):
Tracy just mentioned .. I think .. that moderators can get access to the chat bot here and we can have it respond to FAQs.

Dan (Wed, 04 Mar 2020 22:15:49 GMT):

RajaramKannan (Thu, 05 Mar 2020 14:17:56 GMT):
Hi All, if anyone can provide more information on the Backpressure logic will be great. Specifically, we hit issues today when we sent about 30 batches in about 10 seconds, we saw a few of them fail with the Queue Full (429). The sawtooth rest API FAQ says "The number of batches that validator can accept is based on a multiplier, QUEUE_MULTIPLIER (currently 10, formerly 2), times a rolling average of the number of published batches." Specifically, 1. how is the "rolling average" computed? What is it averaged over? 2. Are there ways to bump up that rolling average or other settings that can potentially reduce the current batch queue (I know the QUEUE_MULTIPLIER is hardcoded to 10, so no way to change it via settings currently or bump it up unless we build a custom validator).

RajaramKannan (Thu, 05 Mar 2020 14:17:56 GMT):
Hi All, if anyone can provide more information on the Backpressure logic will be great. Specifically, we hit issues today when we sent about 30 batches in about 10 seconds, we saw a few of them fail with the Queue Full (429). The sawtooth rest API FAQ says "The number of batches that validator can accept is based on a multiplier, QUEUE_MULTIPLIER (currently 10, formerly 2), times a rolling average of the number of published batches." Specifically, 1. how is the "rolling average" computed? What is it averaged over? 2. Are there ways to bump up that rolling average or other settings that can potentially reduce the current batch queue (I know the QUEUE_MULTIPLIER is hardcoded to 10, so no way to change it via settings currently or bump it up unless we build a custom validator). ``` 2020-03-04 22:01:21.875 INFO back_pressure_handlers] Ending back pressure on client submitted batches: current depth: 8, limit: 10 [2020-03-04 22:01:22.497 INFO back_pressure_handlers] Applying back pressure on client submitted batches: current depth: 10, limit: 10 [2020-03-04 22:01:22.757 INFO back_pressure_handlers] Ending back pressure on client submitted batches: current depth: 8, limit: 10 [2020-03-04 22:01:23.295 INFO back_pressure_handlers] Applying back pressure on client submitted batches: current depth: 10, limit: 10 [2020-03-04 22:01:23.776 INFO back_pressure_handlers] Ending back pressure on client submitted batches: current depth: 8, limit: 10 [2020-03-04 22:01:24.418 INFO back_pressure_handlers] Applying back pressure on client submitted batches: current depth: 10, limit: 10 [2020-03-04 22:01:25.017 INFO back_pressure_handlers] Ending back pressure on client submitted batches: current depth: 8, limit: 10 [2020-03-05 08:24:06.223 INFO back_pressure_handlers] Applying back pressure on client submitted batches: current depth: 12, limit: 10 [2020-03-05 08:26:51.639 INFO back_pressure_handlers] Ending back pressure on client submitted batches: current depth: 0, limit: 10 [2020-03-05 08:26:52.159 INFO back_pressure_handlers] Applying back pressure on client submitted batches: current depth: 10, limit: 10 [2020-03-05 09:13:43.768 INFO back_pressure_handlers] Ending back pressure on client submitted batches: current depth: 0, limit: 10 [2020-03-05 09:13:44.007 INFO back_pressure_handlers] Applying back pressure on client submitted batches: current depth: 10, limit: 10 [2020-03-05 09:21:00.693 INFO back_pressure_handlers] Ending back pressure on client submitted batches: current depth: 0, limit: 10 [2020-03-05 09:21:01.166 INFO back_pressure_handlers] Applying back pressure on client submitted batches: current depth: 10, limit: 10 [2020-03-05 09:21:03.861 INFO back_pressure_handlers] Ending back pressure on client submitted batches: current depth: 6, limit: 10 [2020-03-05 09:21:03.990 INFO back_pressure_handlers] Applying back pressure on client submitted batches: current depth: 10, limit: 10 [2020-03-05 11:29:00.381 INFO back_pressure_handlers] Ending back pressure on client submitted batches: current depth: 0, limit: 10 ```

amundson (Thu, 05 Mar 2020 15:20:14 GMT):
@RajaramKannan the correct client behavior is to resubmit the batch(es) if that happens. 429 must be interpreted as "try again". the current algorithm is pretty good at ramping up quickly to find the right amount of back pressure, you probably shouldn't try and tune anything in the validator.

Dan (Thu, 05 Mar 2020 17:14:15 GMT):
https://github.com/hyperledger/sawtooth-devmode/pull/18 was blocked on CI whitelist but is clean now. Needs another review.

amundson (Thu, 05 Mar 2020 19:52:02 GMT):
@Dan done, merged

Dan (Thu, 05 Mar 2020 20:08:35 GMT):
Gracias

lucgerrits (Fri, 06 Mar 2020 13:51:35 GMT):
Has left the channel.

arsulegai (Sun, 08 Mar 2020 20:03:33 GMT):
@jamesbarry Please share the link to the SDK repositories you have in Go

RajaramKannan (Mon, 09 Mar 2020 04:29:26 GMT):
@amundson thanks for that, we have already started to build a re-submit logic in our app when a 429 occurs (for the moment we are planning to attempt a couple of times with some gap in the retries). Would you be able to outline the current logic. my fear is that if we have a little burst, there will be a lot of 429s, i am happy to consider for example sending a few hearbeat type transactions if I knew the backpressure rampup logic to be able to keep the rolling average high enough. Ideally, I would like to be able to submit a 100 batches in bursts if required

amundson (Mon, 09 Mar 2020 14:31:17 GMT):
@RajaramKannan basically, each validator independently attempts to figure out the rate at which the network has been operating over the last few blocks. Locally, it sets its pending queue to twice (I believe) that number. If the pending queue fills up, you will get 429s. The pending queue is the batch queue which is used when building new blocks, so each block you would expect it to divide by half under load. When the network is not fully loaded, it will drain the pending queue faster (blocks will be larger than 1/2 the pending queue). When the network is idle, the average rate at which it can process blocks is thus very low; there is a minimum estimate, but I don't recall the specific figure. However, it ramps up very quickly (but is constrained by block publishing speed, as that's the rate of recalculation).

Dan (Mon, 09 Mar 2020 18:29:59 GMT):
Was there some pylint update that might cause some build failures? I noticed @rberg2 's doc change failing for example. https://build.sawtooth.me/job/Sawtooth-Hyperledger/job/sawtooth-poet/job/PR-32/4/console ``` 09:47:19 lint_1 | ************* Module sawtooth_poet_engine.engine 09:47:19 lint_1 | engine/sawtooth_poet_engine/engine.py:48:4: W0236: Method 'name' was expected to be 'property', found it instead as 'method' (invalid-overridden-method) 09:47:19 lint_1 | engine/sawtooth_poet_engine/engine.py:51:4: W0236: Method 'version' was expected to be 'property', found it instead as 'method' (invalid-overridden-method) 09:47:19 lint_1 | ************* Module sawtooth_poet_engine.oracle 09:47:19 lint_1 | engine/sawtooth_poet_engine/oracle.py:278:0: R1721: Unnecessary use of a comprehension (unnecessary-comprehension) ```

rberg2 (Mon, 09 Mar 2020 20:35:36 GMT):
That one is an @rbuysse PR :)

RajaramKannan (Tue, 10 Mar 2020 06:38:21 GMT):
Thanks, I believe the number is now 10. we will try and observe how this changes when more batches are published to see how it ramps up.

Dan (Tue, 10 Mar 2020 13:12:19 GMT):
@rberg2 according to autocomplete you two are the same person. I'm sorry I have to be the one to tell you that. I assume there's some common law about property ownership and you guys can work out those details. Meanwhile I think there's still a pylint version thing that might be affecting multiple PRs. Is anyone already aware of that? If not I might be able to look further this afternoon.

MicaelFerreira (Mon, 16 Mar 2020 11:26:04 GMT):
Hi, I was testing stop and start of my tp from one node only and that node validator crashed, when it restarted, it started log the 2f messages. I'll add the log file here, if you want to take a look

MicaelFerreira (Mon, 16 Mar 2020 11:26:04 GMT):
Hi, I was testing stop and start of my tp from one node only and that node validator crashed, after restarting, it started log the 2f messages. I'll add the log file here, if you want to take a look

MicaelFerreira (Mon, 16 Mar 2020 11:26:17 GMT):

pbft-logs-2f-messages.txt

MicaelFerreira (Mon, 16 Mar 2020 11:30:05 GMT):
I'm using pbft 1.0.2

AnthonyWhite (Tue, 17 Mar 2020 10:04:22 GMT):
I have a question for the core contributors of the hyperledger sawtooth framework regarding the read and write permissions of each TP on each namespace. I build a quick TP to test this out by changing the `sawtooth.settings.vote.authorized_keys` and I did it with no issue whatsoever. After that I also tested using other reserved namespaces and the result was the same. I also tried to add an already used namespace on my TP handler and that is also possible without any issues or warnings. Is that supposed to be possible?

ParitoshPandey (Tue, 17 Mar 2020 12:20:51 GMT):
Hi all, The validator in one of our nodes is down Here is the error I see . Any help would be appreciated

ParitoshPandey (Tue, 17 Mar 2020 12:21:12 GMT):

IMG_1856.PNG

arsulegai (Tue, 17 Mar 2020 19:31:25 GMT):
@AnthonyWhite there's misunderstanding here. The setting key you pasted isn't for what you want. In your case of restricting the TP writes and reads, you could use namespace restrictions. That would be enabled if you use allowed transaction processors key.

arsulegai (Tue, 17 Mar 2020 19:32:09 GMT):
@ParitoshPandey Which version are you on? Do you have the scenario, more logs?

RajaramKannan (Wed, 18 Mar 2020 04:59:44 GMT):
@arsulegai we are on Validator v 1.1.5 and PBFT v 1.0.1. The logs might be huge, but is there anything we should look for in it? @ParitoshPandey if we still have the logs pls post maybe a few hours of it. But overall, we noticed some alerts based on monitoring we had setup for errors and for docker exit statuses. There wasnt much activity at that time, no specific scenario as such....

ParitoshPandey (Wed, 18 Mar 2020 05:38:16 GMT):
I am attaching the logs here

ParitoshPandey (Wed, 18 Mar 2020 05:39:25 GMT):

validator-error.log

AnthonyWhite (Wed, 18 Mar 2020 10:30:36 GMT):
I just gave that key as an example. I was worried that someone with a malicious TP could override the settings of the network or another TP's state.

AnthonyWhite (Wed, 18 Mar 2020 10:30:36 GMT):
I just gave that key as an example. I was worried that someone with a malicious TP could override the settings of the network or another TP's state. Thanks @arsulegai

arsulegai (Wed, 18 Mar 2020 11:34:52 GMT):
Sure, yes in production you would set namespace restriction for TPs

arsulegai (Wed, 18 Mar 2020 11:35:46 GMT):
TP cannot be malicious because it has to be installed on multiple or all the validators depending on the consensus algorithm

arsulegai (Wed, 18 Mar 2020 11:36:11 GMT):
There can be a wrongfully implemented TP though

AnthonyWhite (Wed, 18 Mar 2020 11:37:29 GMT):
Yes, but there is always a chance there can be some purposely hidden code for malicious ends

arsulegai (Wed, 18 Mar 2020 11:42:04 GMT):
You could make sure that a TP has isolation from its namespace, if it's must to share the namespace then there shall be careful checks

arsulegai (Wed, 18 Mar 2020 11:42:33 GMT):
* careful code reviews before agreeing to deploy the TP to the validator

RajaramKannan (Sun, 22 Mar 2020 07:25:30 GMT):
Interesting execution error and correction: A few days back, we noticed some error logs in our TP. Note: it wasnt a sawtooth error, but an error example "asset doesnt exist" that the TP itself validates while processing the txn. In these scenarios we log the transaction as failed in our state and send an event back. Interestingly in this case in our 5 node setup, only one node reported this error. All nodes have the same state and same version of TP at the time this occured. However a second later, the transaction again executed (in the TP) and this time it went through fine in this node. I dont have the logs to share (couldnt grab it in a way that I can obscure some of the more sensitive execution information). But just wanted to check if anyone has seen this scenario and under what circumstances it could occur? There were no issues in the end outcome because all nodes ended up with the same state and hence same events sent out that synced up our external DB.

Dan (Sun, 22 Mar 2020 15:28:26 GMT):
I don't know if I've seen that specific issue, but usually if one node responds differently it's because there's some tiny non-determinism hiding in the TP.

jamesbarry (Sun, 22 Mar 2020 20:23:09 GMT):
@RajaramKannan @dan We are seeing this happen on a consistent , but non regular basis in our load testing. We are testing the Sextant stable code base, which is a rev of 1.1x with PBFT. What we believe to be happening after looking at the logs, is somehow a transaction got messed up , possible bad network connection etc. When a valid transaction comes in the bad transaction is disregarded and the node gets back into sync. We use AWS USWest-2 Zone, and sometimes the traffic between nodes is flaky. We believe that is the issue. Both us and Blockchain Technology Partners are looking at this issue. When we work out kinks with the AWS only testing we are going to have a mixed node set and run 3 nodes out of our houses around the USA, and 2 nodes on AWS. That may be more telling in what is happening. We should take this discussion on email discussion forum, as its easier to search and find there than Rocketchat.

RajaramKannan (Mon, 23 Mar 2020 04:57:40 GMT):
thanks @jamesbarry - this is the first time we are seeing it, but atleast we know now it is not out of the ordinary and overall integrity is not compromised

RajaramKannan (Mon, 23 Mar 2020 14:05:29 GMT):
When does a pending transaction dissappear from the queue? We had a situation today where someone sent a batch while other nodes were down. I believe the nodes were being restarted at this point including this node. Now this batch is in Pending status in that node, but in "unknown" in all other nodes. Meanwhile, new transactions were submitted and they went through/got committed.

RajaramKannan (Mon, 23 Mar 2020 14:08:07 GMT):
@jamesbarry @Dan - I was thinking about it and to clarify, we did not submit the transaction again. The TP logs showed that it got re-executed a second time a second later with the right results and got committed!

dushanchain1 (Fri, 27 Mar 2020 06:46:46 GMT):
Has joined the channel.

dushanchain1 (Fri, 27 Mar 2020 06:46:46 GMT):
hi, i'm new to sawtooth and i tried to create and submit a transaction to sawtooth. I'm using XO transaction family. I submitted a transaction to intkey and i worked successfully. But when i submit my transaction using XO family, it gives me "Invalid payload serialization" message. What am i doing wrong? I even asked this from stackoverflow. But noone answered me. Can someone please help me. My payload is -> const payload = { Name: 'new-game', Action: 'create', Space: '', } and my payloadbyes is -> const payloadBytes = cbor.encode(payload) and my payloadSha512 is -> payloadSha512: createHash('sha512').update(payloadBytes).digest('hex')

dushanchain1 (Fri, 27 Mar 2020 06:46:46 GMT):
hi, i'm new to sawtooth and i tried to create and submit a transaction to sawtooth. I'm using XO transaction family. I submitted a transaction to intkey and i worked successfully. But when i submit my transaction using XO family, it gives me "Invalid payload serialization" message. What am i doing wrong? I even asked this from stackoverflow. But noone answered me. Can someone please help me. My payload is -> const payload = { Name: 'new-game', Action: 'create', Space: '', } and my payloadbyes is -> const payloadBytes = cbor.encode(payload) and my payloadSha512 is -> payloadSha512: createHash('sha512').update(payloadBytes).digest('hex') my stackoverflow question -> https://stackoverflow.com/questions/60864191/sawtooth-xo-transaction-family-transaction-submission

MicaelFerreira (Fri, 27 Mar 2020 09:13:42 GMT):
@dushanchain1 Verify that you are using cbor at your TP for decode aswell

MicaelFerreira (Fri, 27 Mar 2020 09:13:42 GMT):
@dushanchain1 Verify that you are also using cbor at your TP for decode

LeonardoCarvalho (Sat, 28 Mar 2020 12:11:19 GMT):
pbft

RajaramKannan (Mon, 30 Mar 2020 10:03:17 GMT):
Hi All - any ideas on this? I am still seeing that batch in pending status in the node. All other nodes show the status as unknown.

arsulegai (Mon, 30 Mar 2020 11:09:25 GMT):
@RajaramKannan This node would broadcast the transaction to other nodes. But since you're saying other nodes were down at that time, the transaction is only with this validator.

arsulegai (Mon, 30 Mar 2020 11:10:07 GMT):
If this validator becomes a leader (based on consensus algorithm you're following) that's the only possibility of the node to send to other validators.

arsulegai (Mon, 30 Mar 2020 11:10:25 GMT):
Or rather resend to other validators.

RajaramKannan (Mon, 30 Mar 2020 11:11:14 GMT):
@arsulegai we are using PBFT - 5 node network ... The other nodes were down for maybe 10 -15 minutes for an upgrade ... so ideally it should have sent it across at somepoint once they were back up?

arsulegai (Mon, 30 Mar 2020 11:11:24 GMT):
BTW, one way to solve it could be if your client can resend the transaction. The validator which has the transaction already would mark it duplicate but other validators receive the transaction and they can process it.

arsulegai (Mon, 30 Mar 2020 11:12:16 GMT):
Pending batches are not resent after initial attempt to broadcast, unless there's a request from other validators

RajaramKannan (Mon, 30 Mar 2020 11:12:23 GMT):
Ah got it

RajaramKannan (Mon, 30 Mar 2020 11:12:49 GMT):
is there anyway instead to clear out the pending batch? I presume it will continue to stay in the validator queue forever otherwise?

arsulegai (Mon, 30 Mar 2020 11:13:24 GMT):
Pending batch queue is in memory, if the service is restarted it's cleared

RajaramKannan (Mon, 30 Mar 2020 11:14:24 GMT):
Since we are using containers, if we restarted the validator container It should clear the pending queue I suppose

RajaramKannan (Mon, 30 Mar 2020 11:14:47 GMT):
thanks so much, always good to know so that we can troubleshoot/document this behavior

arsulegai (Mon, 30 Mar 2020 11:15:15 GMT):
Happy to help

RajaramKannan (Mon, 30 Mar 2020 11:17:09 GMT):
in our case we are retrying if we get a 429, we are also logging all submitted transactions (so if they are pending or lost say because the entire network was brought down and back up after accepting and before commiting it) - we can track. At that point we maynot want to resubmit the transaction because another transaction may supercede it. (We have a job to go and mark that transaction then as Failed externally)

RajaramKannan (Mon, 30 Mar 2020 11:17:17 GMT):
thanks again -

CodeReaper (Mon, 30 Mar 2020 13:32:02 GMT):
Has joined the channel.

CodeReaper (Mon, 30 Mar 2020 13:33:16 GMT):
Hi everyone, I see that there are quite a few open issues in Jira for Sawtooth that have been lying there for a while now. Are these in progress or have been rejected? Any information on them? Like STL-1655 specifically?

amundson (Tue, 31 Mar 2020 16:58:45 GMT):
@CodeReaper many of us are aren't working directly off of jira, so there is a lot of grooming to do there

jarvis569 (Mon, 06 Apr 2020 18:21:37 GMT):
Has joined the channel.

CodeReaper (Tue, 07 Apr 2020 08:37:28 GMT):
I appreciate all the efforts the team has put down for the project, and I find the protocol to be better fit for a lot of use-cases than others, but issue tracking is something required as an evidence to propose any project onto it. This also doesn't give us any visibility of how serious are the current issues on Sawtooth. Please let me know if I'm overlooking anything.

arsulegai (Tue, 07 Apr 2020 18:25:40 GMT):
@CodeReaper how about starting email thread for discussion or using this platform for identified issues. If it's about the design in question or deciding on right or wrong, we will get answers. We should be able to fix some of these working together.

CodeReaper (Wed, 08 Apr 2020 10:20:40 GMT):
Thanks @arsulegai , we will get in touch with you for the support. Thanks very much. Currently we face block syncing issues with our networks with the latest sawtooth images. Is this a a known issue? We can recreate it and share you the access to it.

arsulegai (Wed, 08 Apr 2020 10:58:42 GMT):
No, it should not occur. Are you on the latest PoET and 1.2 version of the Sawtooth?

CodeReaper (Wed, 08 Apr 2020 18:32:11 GMT):
Yes, the images don't seem to have been updated for 2 months and we were utilizing the latest.

arsulegai (Wed, 08 Apr 2020 18:53:20 GMT):
Is it a Z test failure?

CodeReaper (Sat, 11 Apr 2020 17:39:39 GMT):
Haven't yet seen any specific error to it. We are using these parameters so it shall not be the problem, sawtooth.poet.target_wait_time=5 sawtooth.poet.initial_wait_time=25 sawtooth.publisher.max_batches_per_block=100 sawtooth.poet.block_claim_delay=1 sawtooth.poet.key_block_claim_limit=100000 sawtooth.poet.ztest_minimum_win_count=999999999.

arsulegai (Sat, 11 Apr 2020 17:59:32 GMT):
@jamesbarry ^

CodeReaper (Sun, 12 Apr 2020 14:11:39 GMT):
Hi team, I was thinking to try out the latest stable release of sawtooth as I was not able to have any of my transactions committed for my own TP, so I decided to trying the steps from https://sawtooth.hyperledger.org/docs/core/releases/latest/app_developers_guide/docker_test_network.html for poet based setup to confirm there is an issue with the latest stable release itself. I did the exact setup as documented without any customisation. The setup doesn't seem to be giving any issue in itself but no transactions are being committed to the ledger. As the document states in the step to perform the intkey or settings tp transaction, i found out, all remain pending. Is this an known issue that documented steps on the latest stable release don't work?

CodeReaper (Sun, 12 Apr 2020 14:11:39 GMT):
Hi team, I was thinking to try out the latest stable release of sawtooth as I was not able to have any of my transactions committed for my own TP, so I decided to try out the steps from https://sawtooth.hyperledger.org/docs/core/releases/latest/app_developers_guide/docker_test_network.html for poet based setup to confirm there is an issue with the latest stable release itself. I did the exact setup as documented without any customisation. The setup doesn't seem to be giving any issue in itself but no transactions are being committed to the ledger. As the document states in the step to perform the intkey or settings tp transaction, i found out, all remain pending. Is this an known issue that documented steps on the latest stable release don't work?

arsulegai (Sun, 12 Apr 2020 14:33:43 GMT):
@CodeReaper Try this if you're facing this issue https://sawtooth.hyperledger.org/faq/docker/#why-doesn-t-sawtooth-default-poet-yaml-start-the-network-successfully-on-subsequent-runs

CodeReaper (Sun, 12 Apr 2020 14:43:43 GMT):
Yes that's it thanks a lot.

jamesbarry (Sun, 12 Apr 2020 22:44:34 GMT):
@CodeReaper One suggestion is what my team did, and that is use the version from Blockchain Technology Partners. https://hub.docker.com/r/blockchaintp/sawtooth-pbft-engine They are back revved at 1.15, but its stable. This version saved a lot of headaches we were having. I am trying to get them to upstream some of their patches that make this more a stable version. Going to try to put a call on this together this week to talk through some of their patches and push to get them tested and upstreamed. In the meantime, this version (PBFT only) is pretty stable that we have seen.

amundson (Mon, 13 Apr 2020 17:20:26 GMT):
@jamesbarry is the source for that somewhere? if not, let's not reward poor community behavior by making such claims here.

CodeReaper (Tue, 14 Apr 2020 08:50:20 GMT):
Appreciate the point out @jamesbarry . I'll try to set it up for stability test runs to check its stability.

RajaramKannan (Tue, 14 Apr 2020 09:36:04 GMT):
@jamesbarry @amundson we ran some tests with the PBFT engine that btp put up and still find the same catchup issues I have posted elsewhere.... I am in broad agreement with @amundson in general, as it is unclear what license those are made available or if they are under commercial license with the usage of sextant

kodonnel (Tue, 14 Apr 2020 18:22:03 GMT):
The source for all of our open source work in hyperledger and elsewhere is always available on our github organization dedicated for such things. In particular sawtooth-pbft-engine which you main find here https://github.com/blockchaintp/sawtooth-pbft. Any variation which we are happy with but which vary from upstream are tagged with a "p" level attached to the most relevant upstream version.

kodonnel (Tue, 14 Apr 2020 18:24:45 GMT):
So I'd love to dig into the catch up issues you have had, in our experience those can have less to do with the consensus engine than with some idiosyncrasies with the timedcaches in the validator. As for license, there is no separate from the Apache licenses of those image license for those docker images, and as I say above the source is available on our forks.

kodonnel (Tue, 14 Apr 2020 18:25:52 GMT):
Finally for the record, any variants we do make on the code are intended to ultimately be contributed back upstream unless they are backports of items already upstream but for which backports were declined.

kodonnel (Tue, 14 Apr 2020 18:36:40 GMT):
The version of pbft we are currently putting up there on dockerhub is tagged v1.0.1p5.

duncanjw (Tue, 14 Apr 2020 20:48:09 GMT):
@amundson to reiterate what @kodonnel said and for the benefit of everyone on this channel here is our official position - We provide two things - 1. BTP Sawtooth - our distribution of Sawtooth 2. BTP Sextant - our management platform that simplifies its deployment and management on Kubernetes We follow the Red Hat model where we provide customers with long term support for our distribution so we are currently on 1.1.5. We will move to 1.2.x once we think it is stable. However, in the meantime we have significantly improved both the validator and PBFT engine addressing range of issues while carrying out extensive testing with the Tel Aviv Stock Exchange amongst others. BTP Sextant is only available as licensed product which includes support for BTP Sawtooth backed by serious SLAs *However BTP Sawtooth is freely available.* Code is on Github - https://github.com/blockchaintp Artifacts are on DockerHub - https://hub.docker.com/orgs/blockchaintp As @kodonnel notes we are committed to contributing all improvements upstream but clearly customers take priority and get fixes as soon as these are available. Also in one specific case we backported a 1.2 fix not applied to 1.1 branch by the community.

duncanjw (Tue, 14 Apr 2020 20:58:44 GMT):
@RajaramKannan hopefully we've cleared up the license point - you only had to ask! Like @kodonnel I'm sorry you hit some issues. Looking back the email/slack trail went cold at the end of January and we didn't get any feedback on your catch up test. That said just cherry picking pbft-engine from our dockerhub might not have been the most sensible thing to do as we've also made some changes to the validator too particularly wrt to catch up.

duncanjw (Tue, 14 Apr 2020 21:00:29 GMT):
The other thing worth pointing out is that we run all our tests on kubernetes so we are always interested in feedback if folk deploy Sawtooth some other way ...

duncanjw (Tue, 14 Apr 2020 23:10:39 GMT):
@amundson hi. With the Sawtooth report due shortly - delayed due to cancelled TSC last week AFAICT? - is there an update on the discussion from a while back between @jamesbarry and yourself about the future direction of Sawtooth and the potential role of Splinter? Ditto whether there are plans to contribute Splinter to HL? Thanks in advance

RajaramKannan (Wed, 15 Apr 2020 03:19:02 GMT):
@duncanjw @kodonnel thanks for the clarifications and apologies for the confusion. Regarding the catchup tests in Jan, I moved over all the conversations to this chat channel and ultimately we were able to chart over a recovery path by copying over the LMDB files. (for scenarios when the catchup is for several 100s of blocks). But our recent tests show that sometimes the catchup may fail for even a few blocks (it is not very predictable).

RajaramKannan (Wed, 15 Apr 2020 03:20:24 GMT):
@kodonnel I will ping you on the consensus channel where I posted the catchup issue and logs from PBFT yesterday...

RajaramKannan (Wed, 15 Apr 2020 08:46:13 GMT):
@duncanjw @kodonnel - we did trials with both the btp validator as well as the pbft together and see the same results i.e the catchup is inconsistent (roughly works 50% of the time).

amundson (Wed, 15 Apr 2020 13:10:11 GMT):
@duncanjw nothing has changed, it is the plan to use splinter for the networking layer. if you are familiar with splinter, I could dive into the details. but basically, I see two ways of running Sawtooth in the future. one looks like it does now, where we still have a Sawtooth validator process; the other is to run Sawtooth as a splinter service. in both cases, we use splinter's library code in the lower layers.

amundson (Wed, 15 Apr 2020 13:14:09 GMT):
more fundamentally from a Sawtooth perspective, my plan is to push toward a model where we shift the majority of Sawtooth's implementation to libsawtooth and organize the code in a manner that we can use the pieces as building blocks. this should pick up in the next few weeks.

SamParks (Wed, 15 Apr 2020 15:12:12 GMT):
Has joined the channel.

duncanjw (Wed, 15 Apr 2020 18:35:48 GMT):
@amundson when you say the plan/my plan is there an RFC we can review. You mentioned this late last year https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=5Qd23TJwFBD9v6GgE

duncanjw (Wed, 15 Apr 2020 18:36:26 GMT):
But unless I am looking in the wrong place I don't see a corresponding https://github.com/hyperledger/sawtooth-rfcs/pulls

duncanjw (Wed, 15 Apr 2020 18:36:26 GMT):
But unless I am looking in the wrong place I don't see a corresponding pull request https://github.com/hyperledger/sawtooth-rfcs/pulls

amundson (Wed, 15 Apr 2020 20:28:00 GMT):
@duncanjw not sure what you are driving at

arsulegai (Thu, 16 Apr 2020 09:38:20 GMT):
@duncanjw I think you're right @amundson maybe still in the early stage of discussion and getting opinion here.. Next step is surely RFC once the open items/path is discussed

arsulegai (Thu, 16 Apr 2020 09:38:24 GMT):
:)

arsulegai (Thu, 16 Apr 2020 09:43:20 GMT):
There could be actionable items first to convert remaining Python modules to Rust, put them to libsawtooth. Do you think having a call to kick start all these discussions is good?

duncanjw (Thu, 16 Apr 2020 12:33:23 GMT):
@arsulegai that's exactly what I was driving at

MHBauer (Thu, 16 Apr 2020 21:18:30 GMT):
Has left the channel.

Dan (Fri, 24 Apr 2020 19:56:17 GMT):
what do you guys like for making svgs for documentation?

amundson (Fri, 24 Apr 2020 20:33:01 GMT):
omnigraffle

amundson (Fri, 24 Apr 2020 20:33:35 GMT):
libredraw is ok too

Dan (Fri, 24 Apr 2020 20:34:05 GMT):
thanks

YadhuPhilip (Tue, 28 Apr 2020 06:05:28 GMT):
Has joined the channel.

S.GOPINATH (Wed, 29 Apr 2020 07:09:38 GMT):
Has joined the channel.

S.GOPINATH (Wed, 29 Apr 2020 07:09:40 GMT):
hi

S.GOPINATH (Wed, 29 Apr 2020 07:10:39 GMT):
I have a hyperledger sawtooth test setup 4 nodes - pbft consensus engine and each node runs on Ubuntu

S.GOPINATH (Wed, 29 Apr 2020 07:11:37 GMT):
Initially my test application's apply method was called whenever I submit the transactions and blocks were generated

S.GOPINATH (Wed, 29 Apr 2020 07:12:23 GMT):
When I repeat the transaction during testing, 3 of the nodes I find 3 blocks are generated and the one node has only 2 blocks

S.GOPINATH (Wed, 29 Apr 2020 07:12:35 GMT):
how to debug the issue ?

S.GOPINATH (Wed, 29 Apr 2020 07:13:18 GMT):
Now, my apply method is not called when I submit the transaction ( although the same program is used)

S.GOPINATH (Wed, 29 Apr 2020 07:13:42 GMT):
I could see from the log that rest-api server submits the transaction to the local validator

S.GOPINATH (Wed, 29 Apr 2020 07:14:02 GMT):
sawtooth peer list .. I could see 3 peers in each of the nodes

S.GOPINATH (Wed, 29 Apr 2020 07:14:30 GMT):
which means that I got a mesh of validator network in my setup.

S.GOPINATH (Wed, 29 Apr 2020 07:14:33 GMT):
Please help

duncanjw (Wed, 29 Apr 2020 16:38:45 GMT):
Following up on the note I posted April 14 we have formally announced BTP Paralos today - https://medium.com/blockchaintp/btp-delivers-first-long-term-support-lts-release-of-its-hyperledger-sawtooth-distribution-73c7bcf38bf1

duncanjw (Wed, 29 Apr 2020 16:48:03 GMT):
Please note *CTA #1* — If you are working with Hyperledger Sawtooth already then please give our images a try — use tag BTP2.0 to pull the latest stable version of BTP Paralos ...

duncanjw (Wed, 29 Apr 2020 16:48:03 GMT):
Please note >*CTA #1* — If you are working with Hyperledger Sawtooth already then please give our images a try — use tag BTP2.0 to pull the latest stable version of BTP Paralos ...

DavidSetyanugraha (Tue, 05 May 2020 03:39:39 GMT):
Has joined the channel.

sawtooth (Tue, 12 May 2020 18:37:15 GMT):
Has joined the channel.

MicaelFerreira (Wed, 13 May 2020 15:38:15 GMT):
Hi guys, me and @AnthonyWhite are upgrading sawtooth batch injector feature to also allow block_end events. However we are facing some trouble reading the setting "sawtooth.validator.block_validation_rules" at "journal -> state -> settings_view.rs". The settigns we are trying to read are: `sawtooth.validator.block_validation_rules='NofX:1,injector_A;XatY:injector_A,0;NofX:1,injector_B;XatY:injector_B,-1;local:0,-1'` And the error we face is the following: `thread 'PublisherThread' panicked at 'Unable to get setting: EncodingError(WireError(UnexpectedWireType(WireTypeEndGroup)))', src/journal/validation_rule_enforcer.rs:50:17` While running tests locally it works, reading the settings from state does not. Any clue why it it failing to read the rules byte array?

Will_Gluwa (Thu, 14 May 2020 15:33:21 GMT):
Has joined the channel.

yashar.nesvaderani (Thu, 14 May 2020 20:38:40 GMT):
Has joined the channel.

amundson (Sat, 16 May 2020 03:14:12 GMT):
@MicaelFerreira that line here looks like a simple return statement

amundson (Sat, 16 May 2020 03:19:18 GMT):
Looks like a protobuf deserialization error

MicaelFerreira (Sat, 16 May 2020 11:42:29 GMT):
Yes it is a protobuf deserialization error. I tried to shorten the rule and at some point it works, looks like it fails to read when the rules is bigger than a certain length but this doesn't makes sense.

MicaelFerreira (Sat, 16 May 2020 11:42:29 GMT):
Yes it is a protobuf deserialization error. I tried to shorten the rule and at some point it works, looks like it fails to read when the rule is bigger than a certain length but this doesn't makes sense.

Moolkothari (Sat, 23 May 2020 09:57:51 GMT):
Has joined the channel.

S.GOPINATH (Sun, 24 May 2020 17:50:22 GMT):
Hi .. I need a help in understand Sawtooth implementation of Radix Merkle Trie

S.GOPINATH (Sun, 24 May 2020 17:50:55 GMT):
I'm using address 000001 00000000000000000000000000001

S.GOPINATH (Sun, 24 May 2020 17:51:16 GMT):
thats is family is 1 and address under that family also 1.

S.GOPINATH (Sun, 24 May 2020 17:51:49 GMT):
How do the trie looks like ?. Can any one help me in drawing in a piece of paper and pencil and show here

arsulegai (Mon, 25 May 2020 04:48:31 GMT):
@S.GOPINATH the documentation explains it well. Starting from the root node each child node is represented with an address using 1 byte appended to parent's address. So each node has 00 to FF addressed child nodes. In your case `Root Node` -> `Child with address 00` -> `Child with address 00 00` -> `Child with address 00 00 01` -> and so on until you reach the complete address leaf node.

S.GOPINATH (Mon, 25 May 2020 06:19:51 GMT):
Thanks again Arun. The docs I refer is https://sawtooth.hyperledger.org/docs/core/releases/1.2.4/architecture/global_state.html#merkle-radix-tree-overview . It is not clear as you explained. Also, in that I could not find which hash algorithm is used. I dont know whether I have to search further.

S.GOPINATH (Mon, 25 May 2020 12:13:29 GMT):
Hi...in my application I use only one address 000001 00000000000000000000000000001. I want to calculate the State root Hash , given the hash of the payload which is stored in the address. this is for my academic interest.

hidura (Wed, 03 Jun 2020 22:34:04 GMT):
https://github.com/hidura/sawtooth-blockmed Hello friends, this is the 1.0 version of a Sawtooth health care blockchain, based on the wonderful work of @AlexanderZhovnuvaty , I intend to use it to provide for my country DR a healthcare blockchain to track the information of all the patients and the serial diseases that happen every year, mainly to help on those poor communities on the country.

rajesh_kumar_p (Thu, 04 Jun 2020 13:47:48 GMT):
Has joined the channel.

amundson (Fri, 05 Jun 2020 17:32:44 GMT):
All - we are planning on having a series of zoom calls as we do Sawtooth 2 planning/design/architecture. If you have topics you would like to discuss (especially if you have features you re planning to implement), please let me know. I'll post links to the zoom calls here.

amundson (Fri, 05 Jun 2020 21:35:07 GMT):
We are planning for the first session to be on Monday morning at 9am US/Central

amundson (Fri, 05 Jun 2020 21:35:43 GMT):
Topic: Sawtooth Core Working Session Time: Jun 8, 2020 09:00 AM Central Time (US and Canada) Join Zoom Meeting https://us02web.zoom.us/j/89262216179 Meeting ID: 892 6221 6179 One tap mobile +13126266799,,89262216179# US (Chicago) +19292056099,,89262216179# US (New York) Dial by your location +1 312 626 6799 US (Chicago) +1 929 205 6099 US (New York) +1 301 715 8592 US (Germantown) +1 346 248 7799 US (Houston) +1 669 900 6833 US (San Jose) +1 253 215 8782 US (Tacoma) Meeting ID: 892 6221 6179 Find your local number: https://us02web.zoom.us/u/kbfpTnIXUY

amundson (Fri, 05 Jun 2020 21:37:03 GMT):
Initial topics will include moving libsawtooth to its own repo, a sawtotoh splinter service, creating consensus library, pluggable journal, etc. but we will limit the meeting to 2 hours.

amundson (Fri, 05 Jun 2020 21:37:03 GMT):
Initial topics will include moving libsawtooth to its own repo, a sawtooth splinter service, creating consensus library, pluggable journal, etc. but we will limit the meeting to 2 hours.

arsulegai (Sat, 06 Jun 2020 13:31:04 GMT):
@MicaelFerreira @jamesbarry @duncanjw ^

arsulegai (Sat, 06 Jun 2020 13:31:04 GMT):
@MicaelFerreira @jamesbarry @duncanjw @S.GOPINATH ^

duncanjw (Sat, 06 Jun 2020 13:42:48 GMT):
@arsulegai thanks // @KevinODonnell

amundson (Mon, 08 Jun 2020 13:58:14 GMT):
Here is a link to a working doc for the meeting - https://docs.google.com/document/d/1WlF8UfoKAEydJobHWDvz-tMpWqQSTlChoFv50fHX1tM/edit?usp=sharing

amundson (Mon, 08 Jun 2020 16:46:26 GMT):
thanks to everyone who participated in the meeting!

amundson (Mon, 08 Jun 2020 23:18:06 GMT):
We will do another working session on Wednesday 10:00 AM US/Central -- topic will be constrained to identifying what components to move from the validator to libsawtooth prior to splitting it off to its own repo

amundson (Mon, 08 Jun 2020 23:18:32 GMT):
Zoom meeting info is the same as above

amundson (Mon, 08 Jun 2020 23:20:23 GMT):
Intent is that we are literally going to sit on the call and go through code, making a list as a group

arsulegai (Tue, 09 Jun 2020 06:37:01 GMT):
@kodonnel @S.GOPINATH @MicaelFerreira @jamesbarry @wkatsak ^

Dan (Wed, 10 Jun 2020 15:21:27 GMT):
Hi coming up in about an hour and half is our first guest speaker for the DCI-WG. If you are available at noon central / 10 pacific / 8:30 Bangalore you are welcome: https://wiki.hyperledger.org/x/kw7cAQ https://zoom.us/my/hyperledger.community.backup

amundson (Wed, 10 Jun 2020 16:19:13 GMT):

Clipboard - June 10, 2020 11:19 AM

amundson (Wed, 10 Jun 2020 16:19:34 GMT):
^ this is the partial dependency diagram that was used in the working session

amundson (Wed, 10 Jun 2020 17:25:14 GMT):

Clipboard - June 10, 2020 12:25 PM

amundson (Wed, 10 Jun 2020 17:25:26 GMT):
that was the resulting diagram at the end

amundson (Fri, 12 Jun 2020 14:08:36 GMT):
No meeting today; next one will be in a week on Friday

AnthonyWhite (Tue, 16 Jun 2020 12:06:52 GMT):
Greetings, me and my college @MicaelFerreira have forked the sawtooth project so that we could develop the block injector's features especially the `Block End`. We realized that when we set the setting `sawtooth.validator.block_validation_rules` the serialization of the value gets 2 extra bytes in the beginning of the byte array and because of it the validator fails to parse it and crashes the validator in one of the network nodes (This only happens when the string value is at least bigger than 67 chars). We've debugged the validator and everything is working as it should until it tries to parse the byte array in the function `get_setting` implementation for the `SettingsView`. We've made a Spike project mimicking the sawtooth flow from creating the `SettingProposal` and the `SettingsPayload` encoded data like in the `Sawset CLI`, decoded and parsed it into an `Entry` array inside the message `Setting` and decoded it to get the `key` `value` of the setting and everything went smoothly. But when we do the same thing in the validator, it always fails to parse it, so we tried hardcoding the byte array in case the key is equal to `sawtooth.validator.block_validation_rules`, and it worked. Currently I believe the problem might be coming from the `StateReader` but I'm still unsure. Any help from someone who is familiar with the source code would be greatly appreciated! Thank you in advance.

pschwarz (Tue, 16 Jun 2020 16:44:55 GMT):
Did you create a jira ticket for it?

pschwarz (Tue, 16 Jun 2020 16:45:24 GMT):
How are you setting the value?

AnthonyWhite (Tue, 16 Jun 2020 16:50:31 GMT):
No, I haven't. Could you point me in the right direction to do so? Also, would you maybe be available to help us out or just a quick call to discuss a bit of the codebase you helped develop? Regarding the value it's something like `NofX:1,block_info;XatY:block_info,0;XatY:example_info,-1;local:0,-1`

AnthonyWhite (Tue, 16 Jun 2020 16:50:31 GMT):
No, I haven't. Could you point me in the right direction to do so? Also, would you maybe be available to help us out or just a quick call to discuss a bit of the codebase you helped develop? Regarding the value it's something like `NofX:1,block_info;XatY:block_info,0;NofX:1,example_info;XatY:example_info,-1;local:0,-1`

pschwarz (Tue, 16 Jun 2020 16:54:04 GMT):
https://sawtooth.hyperledger.org/community/contributing/

pschwarz (Tue, 16 Jun 2020 16:54:10 GMT):
This is a good start

pschwarz (Tue, 16 Jun 2020 16:54:20 GMT):
I'd need some more details on the issue

pschwarz (Tue, 16 Jun 2020 16:55:10 GMT):
Are you looking at the python implementation or the rust implementation?

AnthonyWhite (Tue, 16 Jun 2020 16:59:21 GMT):
We have reason to believe the problem is in parsing the bytes in the `Merkle.rs` file. If I parse the settings data that I get from the Sawtooth Rest API, it's valid, but for some reason the validator receives a byte array with some extra data that can't be parsed and we have no ideia what it contains.

AnthonyWhite (Tue, 16 Jun 2020 16:59:21 GMT):
We have reason to believe the problem is in parsing the bytes in the `Merkle.rs` file in the Rust implementation. If I parse the settings data that I get from the Sawtooth Rest API, it's valid, but for some reason the validator receives a byte array with some extra data that can't be parsed and we have no ideia what it contains.

AnthonyWhite (Tue, 16 Jun 2020 16:59:21 GMT):
We have reason to believe the problem is in parsing the bytes in the `Merkle.rs` file in the Rust implementation. If I parse the settings data that I get from the Sawtooth Rest API, it's valid, but for some reason the validator receives a byte array with some extra data that can't be parsed and we have no ideia what it contains. By the way, this is all theory, we're still unsure where the problem comes from

AnthonyWhite (Tue, 16 Jun 2020 16:59:21 GMT):
We have reason to believe the problem is in parsing the bytes in the `Merkle.rs` file in the Rust implementation. If I parse the settings data that I get from the Sawtooth Rest API, it's valid, but for some reason the validator receives a byte array with some extra data that can't be parsed and we have no ideia what it contains. By the way, this is all theory, we're still unsure where the problem comes from, but it seems to be from there.

jmbarry (Tue, 16 Jun 2020 17:17:08 GMT):
Has joined the channel.

pschwarz (Tue, 16 Jun 2020 18:31:28 GMT):
Back to my earlier question: how are you setting the value?

pschwarz (Tue, 16 Jun 2020 18:32:16 GMT):
Is it custom code on your end, or are you using the sawtooth commands for this?

AnthonyWhite (Tue, 16 Jun 2020 21:24:18 GMT):
I've set it using the Sawset CLI and when that didn't work I managed to get it working by harcoding the byte array in `SettingsView.get_setting` like this: ``` let setting_opt = if let Some(bytes) = bytes_opt { if key != "sawtooth.validator.block_validation_rules" { Some(protobuf::parse_from_bytes::(&bytes)?) } else { Some(protobuf::parse_from_bytes::(&custom_byte_array)?) } } else { None }; ```

pschwarz (Tue, 16 Jun 2020 21:26:03 GMT):
Do you have the same parsing problem with other settings?

MicaelFerreira (Wed, 17 Jun 2020 08:43:48 GMT):
@pschwarz no, other settings works great, but the setting `block_validation_rules` was not meant to work with multiple rules as we are trying to do now. At least there're no tests for multiple rules.

MicaelFerreira (Wed, 17 Jun 2020 08:49:58 GMT):
That can explain why this issue wasn't noticed before

MicaelFerreira (Wed, 17 Jun 2020 10:51:23 GMT):
Just wanted to leave a quick note here, this documentation https://sawtooth.hyperledger.org/docs/core/nightly/1-1/architecture/injecting_batches_block_validation_rules.html is misleading in the following lines: " A validation rule consists of a name followed by a colon and a comma-separated list of arguments: rulename:arg,arg,...,arg " <- it's not how it's implemented (only the local rule works this way) and "The last transaction in the block has index -1. If abs(Y) is larger than the number of transactions per block, then there would not be a transaction of type X at Y and the block would be invalid." <- isn't currently implemented. Because the latter wasn't implemented, we had to do it for the block_end batch injection event

pschwarz (Wed, 17 Jun 2020 14:10:54 GMT):
Admittedly, the feature was implemented to the degree that we could support the needs of seth. Hence, why the other modes aren't implemented, nor is there a way to dynamically load batch injectors

pschwarz (Wed, 17 Jun 2020 14:11:36 GMT):
I'd have to look, but I wouldn't be surprised to see some Jira stories for some of the remaining work

pschwarz (Wed, 17 Jun 2020 14:14:40 GMT):
It sounds like there are two issues - one setting the block validation setting is producing some invalid content. My guess is that it has more to do with sawset, rather than the validator (all other settings would have the same issue, since they are all written via the same transaction processor). The other issue is the lack of features for block injection.

pschwarz (Wed, 17 Jun 2020 14:14:40 GMT):
It sounds like there are two issues - the first being setting the block validation setting is producing some invalid content. My guess is that it has more to do with sawset, rather than the validator (all other settings would have the same issue, since they are all written via the same transaction processor). The other issue is the lack of features for block injection.

amundson (Wed, 17 Jun 2020 14:19:26 GMT):
Yeah, that was almost certainly text pulled from the original design doc

AnthonyWhite (Wed, 17 Jun 2020 15:14:59 GMT):
@pschwarz Would you be available to talk with us? Regarding the issue coming from Sawset, we've tested it and it doesn't seem to be it. We've debugged from the serialization of the `SettingProposal` and the `SettingsPayload` in Sawset, to the deserialization and serialization of the `Entry` array inside the `Settings` message in the `settingsTP` and it seems to be ok. But when it tries to parse the byte array using protobuf inside the `get_setting` implementation for the `SettingsView` it panics. We've tried to deserialize the bytes on our Spike but to no avail. If we try to deserialize the `sawtooth.validator.block_validation_rules` from the state using the Sawtooth Rest API then we are able to deserialize it using Protobuf. My current guess is with a problem of the deserialization using CBOR on the validator because the Rest API can get the data with no issues, meaning, it's probably not a problem with the serialization but with deserialization. Next we are going to check how the validator processes the data before sending it to the Rest API. Do you know anything related to that?

pschwarz (Wed, 17 Jun 2020 15:32:37 GMT):
Could you send me the exact sawset command you are using?

pschwarz (Wed, 17 Jun 2020 15:32:43 GMT):
(paste it here)

MicaelFerreira (Wed, 17 Jun 2020 15:36:45 GMT):
We are doing it at genesis creation ``` sawset proposal create \ -k /etc/sawtooth/keys/validator.priv \ sawtooth.consensus.algorithm.name=pbft \ sawtooth.consensus.algorithm.version=1.0 \ sawtooth.validator.batch_injectors='block_info,svg_fee_info' \ sawtooth.validator.block_validation_rules='NofX:1,block_info;XatY:block_info,0;XatY:svg_fee_info,-1;local:0,-1' \ -o config.batch ```

MicaelFerreira (Wed, 17 Jun 2020 15:36:45 GMT):
We are doing it at genesis creation ``` sawset proposal create \ -k /etc/sawtooth/keys/validator.priv \ sawtooth.consensus.algorithm.name=pbft \ sawtooth.consensus.algorithm.version=1.0 \ sawtooth.validator.batch_injectors='block_info,svg_fee_info' \ sawtooth.validator.block_validation_rules='NofX:1,block_info;XatY:block_info,0;NofX:1,svg_fee_info;XatY:svg_fee_info,-1;local:0,-1' \ -o config.batch ```

MicaelFerreira (Wed, 17 Jun 2020 15:38:49 GMT):
This does not work: `'NofX:1,block_info;XatY:block_info,0;NofX:1,svg_fee_info;XatY:svg_fee_info,-1;local:0,-1'` This works `'NofX:1,block_info;XatY:block_info,0;XatY:svg_fee_info,-1;local:0,-1'`

MicaelFerreira (Wed, 17 Jun 2020 15:38:49 GMT):
This does not work: `'NofX:1,block_info;XatY:block_info,0;NofX:1,svg_fee_info;XatY:svg_fee_info,-1;local:0,-1'` This works `'NofX:1,block_info;XatY:block_info,0;XatY:svg_fee_info,-1;local:0,-1'` (without the rule "NofX:1,svg_fee_info")

amundson (Wed, 17 Jun 2020 15:44:00 GMT):
It looks to me like the only place EncodingError is generated is in the From impl; you could instrument that to log the bytes it is trying to decode, so you have more information (is it the same bytes that are in state, for example)

amundson (Wed, 17 Jun 2020 15:44:43 GMT):
well, actually, that's just converting the error

amundson (Wed, 17 Jun 2020 15:45:07 GMT):
but, you could instrument get_setting

amundson (Wed, 17 Jun 2020 15:46:25 GMT):
you should set RUST_BACKTRACE=1 so you get the full backtrace

amundson (Wed, 17 Jun 2020 15:49:28 GMT):
in settings_view.rs, around line 122 you will see let setting_opt = if let Some(bytes) = bytes_opt { Some(protobuf::parse_from_bytes::(&bytes)?) you want to know what bytes are there before parse_from_bytes

MicaelFerreira (Wed, 17 Jun 2020 15:50:27 GMT):
That's exactly there where it fails to parse, we are already logging that bytes array

amundson (Wed, 17 Jun 2020 15:50:46 GMT):
https://developers.google.com/protocol-buffers/docs/encoding

MicaelFerreira (Wed, 17 Jun 2020 15:51:34 GMT):
and we've tried to hardcoded the byte array with a valid one, and it successfully parses it

MicaelFerreira (Wed, 17 Jun 2020 15:51:57 GMT):
so the byte array is already wrong at that point

amundson (Wed, 17 Jun 2020 15:52:02 GMT):
so, what is the value there that it doesn't like?

MicaelFerreira (Wed, 17 Jun 2020 15:53:05 GMT):
this is the bad byte array : ``` [88, 135, 10, 132, 1, 10, 41, 115, 97, 119, 116, 111, 111, 116, 104, 46, 118, 97, 108, 105, 100, 97, 116, 111, 114, 46, 98, 108, 111, 99, 107, 95, 118, 97, 108, 105, 100, 97, 116, 105, 111, 110, 95, 114, 117, 108, 101, 115, 18, 87, 78, 111, 102, 88, 58, 49, 44, 98, 108, 111, 99, 107, 95, 105, 110, 102, 111, 59, 88, 97, 116, 89, 58, 98, 108, 111, 99, 107, 95, 105, 110, 102, 111, 44, 48, 59, 78, 111, 102, 88, 58, 49, 44, 115, 118, 103, 95, 102, 101, 101, 95, 105, 110, 102, 111, 59, 88, 97, 116, 89, 58, 115, 118, 103, 95, 102, 101, 101, 95, 105, 110, 102, 111, 44, 45, 49, 59, 108, 111, 99, 97, 108, 58, 48, 44, 45, 49] ```

MicaelFerreira (Wed, 17 Jun 2020 15:54:18 GMT):
the only difference to the good array are the first 2 bytes (i'm trying to find it here in our tests..)

amundson (Wed, 17 Jun 2020 15:55:00 GMT):
what statement in the code are you using to output that?

MicaelFerreira (Wed, 17 Jun 2020 15:55:48 GMT):
`info!({:?})`

MicaelFerreira (Wed, 17 Jun 2020 15:55:48 GMT):
`info!({:?}, byte_array)`

MicaelFerreira (Wed, 17 Jun 2020 15:55:48 GMT):
`info!("{:?}", byte_array)`

pschwarz (Wed, 17 Jun 2020 15:56:33 GMT):
Another question: which build of the settings transaction family are you using? Is it still the python one, or the rust one?

MicaelFerreira (Wed, 17 Jun 2020 15:57:14 GMT):
rust

amundson (Wed, 17 Jun 2020 16:01:04 GMT):
```58 87 0a 84 01 0a 29 73 61 77 74 6f 6f 74 68 2e 76 61 6c 69 64 61 74 6f 72 2e 62 6c 6f 63 6b 5f 76 61 6c 69 64 61 74 69 6f 6e 5f 72 75 6c 65 73 12 57 4e 6f 66 58 3a 31 2c 62 6c 6f 63 6b 5f 69 6e 66 6f 3b 58 61 74 59 3a 62 6c 6f 63 6b 5f 69 6e 66 6f 2c 30 3b 4e 6f 66 58 3a 31 2c 73 76 67 5f 66 65 65 5f 69 6e 66 6f 3b 58 61 74 59 3a 73 76 67 5f 66 65 65 5f 69 6e 66 6f 2c 2d 31 3b 6c 6f 63 61 6c 3a 30 2c 2d 31```

amundson (Wed, 17 Jun 2020 16:01:38 GMT):
what is an example of some valid bytes at that point?

MicaelFerreira (Wed, 17 Jun 2020 16:02:55 GMT):
Well i found it, the difference from the bad byte array to the good one, are the 2 first bytes that are not there: ``` [10, 132, 1, 10, 41, 115, 97, 119, 116, 111, 111, 116, 104, 46, 118, 97, 108, 105, 100, 97, 116, 111, 114, 46, 98, 108, 111, 99, 107, 95, 118, 97, 108, 105, 100, 97, 116, 105, 111, 110, 95, 114, 117, 108, 101, 115, 18, 87, 78, 111, 102, 88, 58, 49, 44, 98, 108, 111, 99, 107, 95, 105, 110, 102, 111, 59, 88, 97, 116, 89, 58, 98, 108, 111, 99, 107, 95, 105, 110, 102, 111, 44, 48, 59, 78, 111, 102, 88, 58, 49, 44, 115, 118, 103, 95, 102, 101, 101, 95, 105, 110, 102, 111, 59, 88, 97, 116, 89, 58, 115, 118, 103, 95, 102, 101, 101, 95, 105, 110, 102, 111, 44, 45, 49, 59, 108, 111, 99, 97, 108, 58, 48, 44, 45, 49] ```

amundson (Wed, 17 Jun 2020 16:04:09 GMT):
```0a 84 01 0a 29 73 61 77 74 6f 6f 74 68 2e 76 61 6c 69 64 61 74 6f 72 2e 62 6c 6f 63 6b 5f 76 61 6c 69 64 61 74 69 6f 6e 5f 72 75 6c 65 73 12 57 4e 6f 66 58 3a 31 2c 62 6c 6f 63 6b 5f 69 6e 66 6f 3b 58 61 74 59 3a 62 6c 6f 63 6b 5f 69 6e 66 6f 2c 30 3b 4e 6f 66 58 3a 31 2c 73 76 67 5f 66 65 65 5f 69 6e 66 6f 3b 58 61 74 59 3a 73 76 67 5f 66 65 65 5f 69 6e 66 6f 2c 2d 31 3b 6c 6f 63 61 6c 3a 30 2c 2d 31```

jsmitchell (Wed, 17 Jun 2020 16:28:06 GMT):
the one starting with 58 87 looks like valid cbor

jsmitchell (Wed, 17 Jun 2020 16:28:57 GMT):
decodes to "\n\x84\x01\n)sawtooth.validator.block_validation_rules\x12WNofX:1,block_info;XatY:block_info,0;NofX:1,svg_fee_info;XatY:svg_fee_info,-1;local:0,-1"

amundson (Wed, 17 Jun 2020 16:29:40 GMT):
so, 10/0a there is basically field_number=1, wire_type=2 and the next byte (132) would be the length

pschwarz (Wed, 17 Jun 2020 16:30:19 GMT):
What bytes are printed with the shorter string?

amundson (Wed, 17 Jun 2020 16:33:25 GMT):
actually, the next two bytes (132, 1) are the length of 132

jsmitchell (Wed, 17 Jun 2020 16:34:23 GMT):
the spurious bytes at the beginning are the cbor byte array wrapping

MicaelFerreira (Wed, 17 Jun 2020 16:34:30 GMT):
@amundson , I've here the log when it panics, `thread 'PublisherThread' panicked at 'Unable to get setting: EncodingError(WireError(UnexpectedWireType(WireTypeEndGroup)))', src/journal/validation_rule_enforcer.rs:50:17`

jsmitchell (Wed, 17 Jun 2020 16:34:36 GMT):
```58 87 # bytes(135) 0A84010A29736177746F6F74682E76616C696461746F722E626C6F636B5F76616C69646174696F6E5F72756C657312574E6F66583A312C626C6F636B5F696E666F3B586174593A626C6F636B5F696E666F2C303B4E6F66583A312C7376675F6665655F696E666F3B586174593A7376675F6665655F696E666F2C2D313B6C6F63616C3A302C2D31 # "\n\x84\x01\n)sawtooth.validator.block_validation_rules\x12WNofX:1,block_info;XatY:block_info,0;NofX:1,svg_fee_info;XatY:svg_fee_info,-1;local:0,-1" ```

MicaelFerreira (Wed, 17 Jun 2020 16:35:32 GMT):
@pschwarz the shorted setting that works normally: ```[10, 132, 1, 10, 41, 115, 97, 119, 116, 111, 111, 116, 104, 46, 118, 97, 108, 105, 100, 97, 116, 111, 114, 46, 98, 108, 111, 99, 107, 95, 118, 97, 108, 105, 100, 97, 116, 105, 111, 110, 95, 114, 117, 108, 101, 115, 18, 87, 78, 111, 102, 88, 58, 49, 44, 98, 108, 111, 99, 107, 95, 105, 110, 102, 111, 59, 88, 97, 116, 89, 58, 98, 108, 111, 99, 107, 95, 105, 110, 102, 111, 44, 48, 59, 78, 111, 102, 88, 58, 49, 44, 115, 118, 103, 95, 102, 101, 101, 95, 105, 110, 102, 111, 59, 88, 97, 116, 89, 58, 115, 118, 103, 95, 102, 101, 101, 95, 105, 110, 102, 111, 44, 45, 49, 59, 108, 111, 99, 97, 108, 58, 48, 44, 45, 49] ```

amundson (Wed, 17 Jun 2020 16:36:43 GMT):
double-cbor'd somewhere?

amundson (Wed, 17 Jun 2020 16:41:59 GMT):

Clipboard - June 17, 2020 11:41 AM

amundson (Wed, 17 Jun 2020 16:42:43 GMT):
seems like even the short string above would be >0.17

amundson (Wed, 17 Jun 2020 16:42:43 GMT):
seems like even the short string above would be >0x17

pschwarz (Wed, 17 Jun 2020 16:43:24 GMT):
I think this issue is fixed in this commit: https://github.com/hyperledger/sawtooth-core/pull/2251/commits/20d7b4f8592e9d31ac75527d8f0b93380712a21c

amundson (Wed, 17 Jun 2020 16:43:26 GMT):
(cbor is truly wild)

pschwarz (Wed, 17 Jun 2020 16:43:29 GMT):
Which hasn't yet been merged

MicaelFerreira (Wed, 17 Jun 2020 16:46:49 GMT):
Hmm it could be

amundson (Wed, 17 Jun 2020 16:46:50 GMT):
that doesn't explain why different lengths would matter

amundson (Wed, 17 Jun 2020 16:51:01 GMT):
@MicaelFerreira you gave the same string, one without the cbor prefix, one without. I'm still not sure how you captured those and whether the different-lengths thing is accurate.

amundson (Wed, 17 Jun 2020 16:53:56 GMT):
@pschwarz kind of surprised anything works if that patch is accurate (which it probably is)

pschwarz (Wed, 17 Jun 2020 16:56:46 GMT):
Very true. Though, the only rust code that currently reads state is the settings view, and most of those values are short

pschwarz (Wed, 17 Jun 2020 16:57:27 GMT):
So if there's some reason that the cbor encoder isn't writing that leading tag, then it probably just works by accident

amundson (Wed, 17 Jun 2020 16:57:50 GMT):
maybe the cbor encoding for some things would accidentally valid protobuf that would just result in it getting thrown away, but it seems unlikely

pschwarz (Wed, 17 Jun 2020 16:58:04 GMT):
Clearly, came across this issue in Sept of last year while testing the permissions stuff

amundson (Wed, 17 Jun 2020 16:58:40 GMT):
I would have expected to see this probably crop up in poet testing

pschwarz (Wed, 17 Jun 2020 16:58:46 GMT):
True

amundson (Wed, 17 Jun 2020 16:59:01 GMT):
well

AnthonyWhite (Wed, 17 Jun 2020 16:59:06 GMT):
@pschwarz We are going to try it with the changes from your pull request tomorrow and get back to you

pschwarz (Wed, 17 Jun 2020 16:59:20 GMT):
I'll pull that commit out into its own PR

amundson (Wed, 17 Jun 2020 16:59:23 GMT):
we are probably looking at master and not the 1.2 branch, am I right?

pschwarz (Wed, 17 Jun 2020 16:59:37 GMT):
Yes, in this case

amundson (Wed, 17 Jun 2020 17:00:29 GMT):
wonder if this patch applies to 1.2 then

amundson (Wed, 17 Jun 2020 17:00:52 GMT):
(seems likely)

pschwarz (Wed, 17 Jun 2020 17:01:00 GMT):
Probably

amundson (Wed, 17 Jun 2020 17:02:10 GMT):
@AnthonyWhite @MicaelFerreira good luck, let us know how it goes

MicaelFerreira (Wed, 17 Jun 2020 17:03:31 GMT):
@amundson The one with cbor prefix we got it from settings view from the output of the state_reader; The one without cbor prefix was from our rust spike using protobuf

AnthonyWhite (Wed, 17 Jun 2020 17:03:45 GMT):
Will do @amundson @pschwarz Thank you for the help

MicaelFerreira (Wed, 17 Jun 2020 17:04:06 GMT):
We are using the branch 1.2, not the master

amundson (Wed, 17 Jun 2020 17:05:22 GMT):
it would be good to run a shorter string through and capture it and see what the encoding is (in the environment that the longer one breaks) if we continue to believe length matters

AnthonyWhite (Wed, 17 Jun 2020 17:06:28 GMT):
@amundson We'll prepare those and the rust stack trace and post it here tomorrow

AnthonyWhite (Wed, 17 Jun 2020 17:06:28 GMT):
@amundson We'll prepare those and the rust stack trace and post it here

amundson (Wed, 17 Jun 2020 17:07:36 GMT):
rust stack traces probably less important now that we have a reasonable theory

MicaelFerreira (Wed, 17 Jun 2020 17:08:28 GMT):
the shorter setting which works: ```[88, 114, 10, 112, 10, 41, 115, 97, 119, 116, 111, 111, 116, 104, 46, 118, 97, 108, 105, 100, 97, 116, 111, 114, 46, 98, 108, 111, 99, 107, 95, 118, 97, 108, 105, 100, 97, 116, 105, 111, 110, 95, 114, 117, 108, 101, 115, 18, 67, 78, 111, 102, 88, 58, 49, 44, 98, 108, 111, 99, 107, 95, 105, 110, 102, 111, 59, 88, 97, 116, 89, 58, 98, 108, 111, 99, 107, 95, 105, 110, 102, 111, 44, 48, 59, 88, 97, 116, 89, 58, 115, 118, 103, 95, 102, 101, 101, 95, 105, 110, 102, 111, 44, 45, 49, 59, 108, 111, 99, 97, 108, 58, 48, 44, 45, 49] ```

amundson (Wed, 17 Jun 2020 17:10:24 GMT):
ok, give me a sec and I'll try to see what 88,114 means to protobuf

amundson (Wed, 17 Jun 2020 17:11:10 GMT):
(clearly its cbor, but what I want to know is how protobuf will interpret it)

MicaelFerreira (Wed, 17 Jun 2020 17:12:59 GMT):
Just to make it clear, that byte array is the result from state_reader at get_setting() implementation from SettingsView

jsmitchell (Wed, 17 Jun 2020 17:13:20 GMT):
that byte array doesn't parse as cbor

amundson (Wed, 17 Jun 2020 17:14:04 GMT):
88,114 would be string of length 114 right?

amundson (Wed, 17 Jun 2020 17:15:07 GMT):
its length 116, so it seems like it should be valid

amundson (Wed, 17 Jun 2020 17:16:37 GMT):
the difference between 88,132 and 88,114 is that in protobuf 114 is a one-byte value and 132 will pull in another byte because the highest bit is set

jsmitchell (Wed, 17 Jun 2020 17:20:44 GMT):
i lied, it does parse

jsmitchell (Wed, 17 Jun 2020 17:20:47 GMT):
```>>> cbor.loads(bytes(foo)) b'\np\n)sawtooth.validator.block_validation_rules\x12CNofX:1,block_info;XatY:block_info,0;XatY:svg_fee_info,-1;local:0,-1' ```

amundson (Wed, 17 Jun 2020 17:21:26 GMT):
88 has the meaning field_type = varint, field = 11. so indeed, that appears to be valid protobuf (stuffing the length of the cbor into field 11)

MicaelFerreira (Wed, 17 Jun 2020 17:21:32 GMT):
ya it should, because that setting works :)

amundson (Wed, 17 Jun 2020 17:21:45 GMT):
which, for most protobuf, will work because we don't have many field 11s

jsmitchell (Wed, 17 Jun 2020 17:21:53 GMT):
crazy

amundson (Wed, 17 Jun 2020 17:23:40 GMT):
this seems well-understood to me at this point, I don't think we need the extra work tomorrow beyond testing @pschwarz's fix

amundson (Wed, 17 Jun 2020 17:25:15 GMT):
@pschwarz should they be testing the entire branch or just that one commit? what would we backport?

arsulegai (Wed, 17 Jun 2020 18:24:38 GMT):
Interesting, so protobuf deserialization happened on cbor? Shouldn't this fail?

arsulegai (Wed, 17 Jun 2020 18:45:26 GMT):
Understood

pschwarz (Wed, 17 Jun 2020 19:16:44 GMT):
I think just testing that one commit would be fine

pschwarz (Wed, 17 Jun 2020 22:12:51 GMT):
https://github.com/hyperledger/sawtooth-core/pull/2304

MicaelFerreira (Thu, 18 Jun 2020 10:00:36 GMT):
@pschwarz your commit definitely fixed the issue. It's working great now

MicaelFerreira (Thu, 18 Jun 2020 10:03:14 GMT):
We must thank you guys for the support yesterday, @amundson @pschwarz @jsmitchell

amundson (Thu, 18 Jun 2020 14:22:11 GMT):
Reminder: The next Sawtooth Core Working Session meeting is Friday at 10am US/Central

amundson (Thu, 18 Jun 2020 14:22:45 GMT):
Time: Jun 19, 2020 10:00 AM Central Time (US and Canada) Join Zoom Meeting https://us02web.zoom.us/j/89262216179 Meeting ID: 892 6221 6179 One tap mobile +13126266799,,89262216179# US (Chicago) +19292056099,,89262216179# US (New York) Dial by your location +1 312 626 6799 US (Chicago) +1 929 205 6099 US (New York) +1 301 715 8592 US (Germantown) +1 346 248 7799 US (Houston) +1 669 900 6833 US (San Jose) +1 253 215 8782 US (Tacoma) Meeting ID: 892 6221 6179 Find your local number: https://us02web.zoom.us/u/kbfpTnIXUY

amundson (Thu, 18 Jun 2020 20:51:21 GMT):
The working doc for the meeting is: https://docs.google.com/document/d/1WlF8UfoKAEydJobHWDvz-tMpWqQSTlChoFv50fHX1tM/edit#

amundson (Thu, 18 Jun 2020 20:51:59 GMT):
Agenda: next 1.2.x release; 1-3 branch; builds in master; sawtooth library status; sawtooth library RFC

amundson (Thu, 18 Jun 2020 20:52:40 GMT):
note that a draft version of the RFC is in that working doc, at the end, if you want to read it prior to the meeting

mrausnadian (Fri, 19 Jun 2020 09:53:43 GMT):
Has joined the channel.

amundson (Wed, 24 Jun 2020 16:33:18 GMT):
@arsulegai @ltseeley should we move forward with https://github.com/hyperledger/sawtooth-core/pull/2312/files and then build upon it, or are we going to iterate on that PR itself?

arsulegai (Wed, 24 Jun 2020 16:36:33 GMT):
My take on the PR: It's not serving the purpose of abstracting away Py impl dependency from the place where it is used. However, it does put up a skeleton trait for Rust impl.

ltseeley (Wed, 24 Jun 2020 16:40:10 GMT):
I don't have a strong preference

amundson (Wed, 24 Jun 2020 16:49:04 GMT):
I don't either. @arsulegai you decide

arsulegai (Wed, 24 Jun 2020 16:52:32 GMT):
Same here, no hard preferences

amundson (Wed, 24 Jun 2020 16:54:35 GMT):
Ok, I'll merge it, because I think it will make the future related PRs less verbose and more focused.

amundson (Wed, 24 Jun 2020 16:54:58 GMT):
I was too slow, now it has conflicts

arsulegai (Wed, 24 Jun 2020 17:19:41 GMT):
Resolved, but build will take 1:30 hours

amundson (Thu, 25 Jun 2020 15:40:43 GMT):
@ltseeley is investigating https://docs.rs/metrics/0.12.1/metrics/

ltseeley (Thu, 25 Jun 2020 16:48:12 GMT):
Here's a PR to use that metrics crate in the validator: https://github.com/hyperledger/sawtooth-core/pull/2319 Still working on testing it out properly

Will_Gluwa (Fri, 26 Jun 2020 18:02:12 GMT):
Any update on https://github.com/hyperledger/sawtooth-core/pull/1994 / https://jira.hyperledger.org/browse/STL-1477? This is causing lots of problems with our attempted migration from 1.0.5 to 1.2

shantanhunt (Sun, 28 Jun 2020 19:28:17 GMT):
Has joined the channel.

amundson (Tue, 30 Jun 2020 14:21:27 GMT):
@Will_Gluwa I think the PR is abandoned; there was a lot of discussion that didn't get summarized in the PR about it and alternative fixes (none of which were implemented); long story short it is not necessarily the right fix to the problem.

Will_Gluwa (Wed, 01 Jul 2020 04:54:40 GMT):
It seems like a pretty large issue to keep unpatched for so long, no? Are others not impacted by the problem?

amundson (Mon, 06 Jul 2020 17:38:18 GMT):
I opened up a PR with a bunch of backports to 1-2. They looked safe to me, found by comparing branches. Could use multiple eyes on this for sure, and testing if folks have the capacity -- https://github.com/hyperledger/sawtooth-core/pull/2324

amundson (Mon, 06 Jul 2020 17:55:22 GMT):
I think we should make the 1-3 branch point cd283d14954066af08a3b09aa85708c8cc226e3c, right before we added libsawtooth, then cherry-pick non-libsawtooth things to it

arsulegai (Tue, 07 Jul 2020 10:46:06 GMT):
PR for moving modules from validator to libsawtooth https://github.com/hyperledger/sawtooth-core/pull/2323

amundson (Tue, 07 Jul 2020 17:02:25 GMT):
Moving sawtooth-core rust code into the sawtooth crate is done, so we are moving forward with populating the sawtooth-lib repo and purging sawtooth-core's master of that code. The plan is to create "master" and "0-3" branches in sawtooth-lib and point the validator at the "0-3" for now (master becoming 0.4.

amundson (Tue, 07 Jul 2020 17:03:44 GMT):
The next Sawtooth working session will be on Friday 10am US/Central (same zoom URL as posted previously)

amundson (Tue, 07 Jul 2020 17:05:09 GMT):
Agenda will roughly be: review progress toward sawtooth-lib; immediate next steps (protocol layer, adopt Transact)

amundson (Tue, 07 Jul 2020 17:06:28 GMT):
and 1-3/master branch updates/status for sawtooth-core as well

amundson (Tue, 07 Jul 2020 17:10:11 GMT):
We are actively seeking assistance rewriting the REST API and CLIs in Rust short-term, because it will be necessary soon in order to rewrite the integration tests, because we want to do full-stack tests within a single process

amundson (Tue, 07 Jul 2020 17:10:30 GMT):
so, if anyone is looking for a way to help... :)

Dan (Tue, 07 Jul 2020 17:10:42 GMT):
What is rust?

amundson (Tue, 07 Jul 2020 17:10:58 GMT):
its like python

Dan (Tue, 07 Jul 2020 17:11:08 GMT):
lol

amundson (Tue, 07 Jul 2020 17:13:15 GMT):
if someone wants a more technical challenge, adding a backward-compatible transaction processor backend to Transact would be good, otherwise we likely loose that compatibility short-term

Dan (Tue, 07 Jul 2020 17:13:59 GMT):
ok, I fixed the python code.. just added `__declspec(rust)` before each function. I think that's it.

amundson (Tue, 07 Jul 2020 17:14:30 GMT):
we are planning to compile in all Sawtooth-provided smart contracts into the validator, so we will not have separate processes for them

Dan (Tue, 07 Jul 2020 17:15:29 GMT):
you mean like settings, identity, etc.?

amundson (Tue, 07 Jul 2020 17:15:49 GMT):
yes - settings, identity, sabre, xo, intkey, etc.

Dan (Tue, 07 Jul 2020 17:16:42 GMT):
does that have any deployment/upgrade implications? Like if I want to deploy intkey v2 then I need to restart the validator instead of pushing live?

amundson (Tue, 07 Jul 2020 17:17:23 GMT):
yes, which is why apps should use sabre or another on-chain smart contract engine

amundson (Tue, 07 Jul 2020 17:19:43 GMT):
another interesting thing to do for backward compatibility would be to figure out how to run existing python smart contracts inside sabre with minimal modifications

AnthonyWhite (Wed, 08 Jul 2020 15:00:58 GMT):
Hi guys, I have a question regarding the block validation rules related to this pull request https://github.com/hyperledger/sawtooth-core/pull/2314. For some reason you are enforcing the validation rules every time a batch is added by calling the `validation_rule_enforcer::enforce_validation_rules` method, that leads to duplicate validations on some batches. In case of rules set using negative values that reference the end of the block (`XatY`) or rules that have a maximum amount of transactions of a certain family (`NofX`), not all validations can be done every time a batch is added or are unnecessary to do so from my point of view. Is it necessary for some special reason or could all the validations be done just before finalizing the block?

arsulegai (Wed, 08 Jul 2020 18:41:41 GMT):
@AnthonyWhite can you please explain duplicate validations?

Will_Gluwa (Wed, 08 Jul 2020 23:00:41 GMT):
Is there a working doc for Friday's meeting?

arsulegai (Thu, 09 Jul 2020 03:25:44 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=chLykj5oaF75nGpuG) @Will_Gluwa here it is

Will_Gluwa (Thu, 09 Jul 2020 03:26:10 GMT):
Thanks @arsulegai !

AnthonyWhite (Thu, 09 Jul 2020 08:50:05 GMT):
Basically, every time a batch is added it validates all the pending batches in the line https://github.com/hyperledger/sawtooth-core/blob/1-2/validator/src/journal/candidate_block.rs#L273

arsulegai (Thu, 09 Jul 2020 17:42:25 GMT):
Isn't it batches to be added only?

amundson (Fri, 10 Jul 2020 14:41:33 GMT):
Reminder, the URL for Today's 10am (US/Central) working session is https://us02web.zoom.us/j/89262216179

amundson (Fri, 10 Jul 2020 14:41:56 GMT):
The working session doc is - https://docs.google.com/document/d/1WlF8UfoKAEydJobHWDvz-tMpWqQSTlChoFv50fHX1tM/edit#

kodonnel (Fri, 10 Jul 2020 16:58:54 GMT):
moving over to here https://chat.hyperledger.org/channel/transact?msg=uQDGq3NxPfuz5PpHR

kodonnel (Fri, 10 Jul 2020 17:00:19 GMT):
Actually, looking through transact. Is it just a matter of having an alternate implementation of something like `libtransact/src/sawtooth.rs` the remotes the calls?

kodonnel (Fri, 10 Jul 2020 17:03:18 GMT):
probably more generically than sawtooth though would be good

kenty (Sat, 11 Jul 2020 01:43:34 GMT):
Has joined the channel.

kenty (Sat, 11 Jul 2020 01:46:09 GMT):
I don't see a search box in the splinter website. I think we should keep one for sawtooth v2.0 docs. I use it quite a lot for CHSA.

arsulegai (Sat, 11 Jul 2020 09:18:25 GMT):
Sounds good option

Dan (Mon, 13 Jul 2020 16:38:57 GMT):
several projects including the linux kernel are adopting more inclusive terminology to replace master/slave and blacklist/whitelist. I've audited our project and there is very little to be fixed. There's some poet documentation, the jenkins whitelist check, and then 3rd party files like pylint. I think we should address our files but not modify the 3rd party files. I've created one PR just to get feedback before tackling the jenkins files. Please take a look and provide feedback: https://github.com/hyperledger/sawtooth-poet/pull/48

kenty (Mon, 13 Jul 2020 17:32:59 GMT):
I am in the LMWG meeting and was wondering if we can get any analytics on sawtooth documentation usage, eg. which versions were most accessed. At the moment, I can only see this for sawtooth: https://lfanalytics.io/projects/hyperledger%2Fsawtooth/dashboard

Dan (Mon, 13 Jul 2020 17:45:01 GMT):
@kenty I don't have access to any analytics. This is the main docs page though which has links to each version. https://sawtooth.hyperledger.org/docs/

Dan (Mon, 13 Jul 2020 17:46:20 GMT):
@ryanbeck I'm looking at renaming whitelist as above. It looks like each repo has the same whitelist bin script. Would it make sense to remove that from all repos and just have it local to jenkins? It looks like jenkins already relies on a centralized auth list to feed to that script.

Dan (Mon, 13 Jul 2020 17:50:36 GMT):
or maybe the right handle is @rbuysse

kenty (Mon, 13 Jul 2020 17:50:57 GMT):
@Dan thanks, I am thinking about how to revamp the existing docs site and prepare for v2.0

rbuysse (Mon, 13 Jul 2020 17:51:25 GMT):
I don't think it should get removed

rbuysse (Mon, 13 Jul 2020 17:51:38 GMT):
rename is fine though

Dan (Mon, 13 Jul 2020 17:54:45 GMT):
I was thinking it might be more secure to have that file out of possible attacker control. Is there a benefit to having the script in each repo rather than local to the jenkins server?

rbuysse (Mon, 13 Jul 2020 17:59:31 GMT):
portability and transparency is the main thing

rbuysse (Mon, 13 Jul 2020 17:59:48 GMT):
we do readTrusted on that file so it's not an attack vector https://www.jenkins.io/doc/pipeline/steps/workflow-multibranch/#readtrusted-read-trusted-file-from-scm

AnthonyWhite (Tue, 14 Jul 2020 08:26:13 GMT):
Yes, I'll rephrase, every time a batch is added to the vector `batches_to_add` the `validation_rule_enforcer::enforce_validation_rules` is called with all the previously added batches (`pending_batches`) and if it passes the new batch is added to the `pending_batches`

AnthonyWhite (Tue, 14 Jul 2020 08:26:13 GMT):
Yes, I'll rephrase, every time the `add_batch` method is called the batch is added to the vector `batches_to_add` the `validation_rule_enforcer::enforce_validation_rules` is called with all the previously added batches (`pending_batches`) and if it passes, the new batch is added to the `pending_batches`

amundson (Fri, 24 Jul 2020 14:13:45 GMT):
Please refrain from using threading on conversations here. We don't have the time or patience to click on every thread separately.

amundson (Fri, 24 Jul 2020 14:18:54 GMT):
Next working session will be Friday, Aug 7 at 10am

amundson (Mon, 27 Jul 2020 20:39:18 GMT):

Sawtooth code dependencies (between components)

amundson (Thu, 06 Aug 2020 22:09:25 GMT):
Reminder, the Zoom for the 10am (US/Central) meeting tomorrow is: https://us02web.zoom.us/j/89262216179

amundson (Thu, 06 Aug 2020 22:55:24 GMT):
the working doc for the meeting is - https://docs.google.com/document/d/1WlF8UfoKAEydJobHWDvz-tMpWqQSTlChoFv50fHX1tM/edit#

Dan (Fri, 07 Aug 2020 13:05:09 GMT):
We merged the whitelist->authlist change in core a few weeks ago. That same commit is rippled to the other repos. Really easy PR reviews: https://github.com/search?q=org%3Ahyperledger+Revise+jenkins+scripts+for+inclusive+terminology&type=Issues

Dan (Fri, 07 Aug 2020 13:10:02 GMT):
Anyone have a summary of the CI change? ```Remove existing build/ci infrastructure Put new build/ci infrastructure in place (borrowing from Splinter and Grid work) ``` Not sure if I can make the 10 am call.

amundson (Fri, 07 Aug 2020 14:56:54 GMT):
@Dan its not actual CI stuff (Jenkins, whatever) - it's the docker images and how they build the site

Dan (Fri, 07 Aug 2020 14:57:14 GMT):
cool. thanks.

amundson (Fri, 07 Aug 2020 14:57:38 GMT):
it is present currently in the grid-docs refresh branch

amundson (Fri, 07 Aug 2020 14:57:57 GMT):
you can clone it and do 'just run' to get the site up quickly. good stuff.

Dan (Mon, 24 Aug 2020 17:21:32 GMT):
@rbuysse @rberg2 We committed a change to sawtooth-core for inclusive wording with the jenkins scripts. I've got replicas of that PR for the other repos. If you or others want to approve those, we can get uniformity across the repos: https://github.com/search?q=org%3Ahyperledger+sawtooth+inclusive&type=Issues

Will_Gluwa (Wed, 26 Aug 2020 03:43:25 GMT):
Is there any reason to believe that using FQDN instead of IPV4 addresses with Sawtooth will cause performance issues?

MicaelFerreira (Wed, 26 Aug 2020 16:27:56 GMT):
Hi there, i'm trying to run docker composer run-lint at branch 1-3 of the sawtooth-core but i'm facing alot of issues accessing to the repositories, any ideas?

MicaelFerreira (Wed, 26 Aug 2020 16:28:00 GMT):

Clipboard - August 26, 2020 5:27 PM

MicaelFerreira (Wed, 26 Aug 2020 16:28:32 GMT):
The cmd i ran `docker-compose -f docker/compose/run-lint.yaml up --abort-on-container-exit --exit-code-from lint-validator lint-validator`

Dan (Wed, 26 Aug 2020 17:37:12 GMT):
are you behind a proxy and if so do you have environment variables like https_proxy, HTTPS_PROXY, etc set?

amundson (Wed, 26 Aug 2020 18:53:59 GMT):
wonder if adding --build there would help. I think maybe that could be a docker caching problem if not a proxy problem

MicaelFerreira (Thu, 27 Aug 2020 08:29:51 GMT):
i'm not behind a proxy, i can run docker compose for sawtooth-core build, so it access the repositories of the services to download the images with no problems (e.g. Step 4 of the BUILD.md)

MicaelFerreira (Thu, 27 Aug 2020 08:47:45 GMT):
I figured out the problem, i removed the proxy args from the run-lint.yaml docker compose file

MicaelFerreira (Thu, 27 Aug 2020 13:44:36 GMT):
I'm facing another issue now, after updating to sawtooth branch 1-3 I'm not able to do any transaction to the validator, it looks like transactions are getting ignored completely. I'm using latest version on all services, pbft-engine starts with version 1.0.3 and validator with version 1.3.1.dev1. At the rest-api the transactions POST's are successfull with an OK status. I have validator verbosity at max and I'm testing doing transaction with the intkey-tp, but there are no output, it's failing silently. Any help?

amundson (Thu, 27 Aug 2020 14:21:59 GMT):
The date for the next contributor working session is Sept 4, not tomorrow. I had the date incorrect in the working doc; sorry for any confusion.

MicaelFerreira (Fri, 28 Aug 2020 08:35:50 GMT):
Nevermind this, it was my fault

Patrick-Erichsen1 (Mon, 31 Aug 2020 21:35:17 GMT):
Has joined the channel.

amundson (Thu, 03 Sep 2020 17:30:05 GMT):
Reminder, the URL for tomorrow's 10am (US/Central) contributor meeting is https://us02web.zoom.us/j/89262216179

amundson (Fri, 04 Sep 2020 14:58:37 GMT):
The working doc for the meeting is - https://docs.google.com/document/d/1WlF8UfoKAEydJobHWDvz-tMpWqQSTlChoFv50fHX1tM/edit

Glenn_Gluwa (Tue, 08 Sep 2020 23:03:04 GMT):
Has joined the channel.

infrared (Thu, 10 Sep 2020 17:45:51 GMT):
Has joined the channel.

MicaelFerreira (Fri, 11 Sep 2020 08:55:52 GMT):
Hi guys, I'm been having some weird issues regarding the lost of peers from the validator when for some reason there's an invalid transaction, the validator is ignored by the others (it still has the peers) and so it stops posting batches. It doesn't look like this should be the right behaviour, is it? (I'm sorry in advance i had not the time to check in the source code for this)

MicaelFerreira (Fri, 11 Sep 2020 08:56:41 GMT):
Btw, i'm using branch 1-3, with pbft as consensus and a 5 node network

Dan (Fri, 11 Sep 2020 19:19:28 GMT):
Hi, I need one more reviewer on these (thanks Ryan for being the first) https://github.com/search?q=org:hyperledger+sawtooth+inclusive&type=Issues; the gist is replicating the same inclusive wording change we made in core to the other repos so they are all consistent.

MicaelFerreira (Mon, 14 Sep 2020 15:29:19 GMT):
No thoughts about it? Thank you

MicaelFerreira (Mon, 14 Sep 2020 15:38:37 GMT):
I have another question i would like to know the implications of it, sometimes i have to update dozens to some hundreds of addresses at once in a single transaction. Is there a correct way to do such thing? Because with around 50 addresses to set, the block takes a minute or 2 to get committed, i wonder what would be the behaviour with about some hundreds of addresses to set...

MicaelFerreira (Mon, 14 Sep 2020 15:39:32 GMT):
Is this even feasible at all? Setting all these addresses at once?

amundson (Mon, 14 Sep 2020 17:04:07 GMT):
you can update multiple locations in state with a single call; which SDK?

MicaelFerreira (Mon, 14 Sep 2020 17:15:30 GMT):
python

MicaelFerreira (Mon, 14 Sep 2020 17:19:16 GMT):
that's actually a very good point that i completely forgot, thanks for the reminder!

MicaelFerreira (Mon, 14 Sep 2020 17:19:16 GMT):
Actually i'm already doing that, i send an array with all the key-value into set_state()

rbuysse (Mon, 14 Sep 2020 19:50:53 GMT):
@MicaelFerreira PR 2331 passes linting if if it's rebased on 1-3. There were some lint fixes merged a while ago.

rbuysse (Mon, 14 Sep 2020 19:51:13 GMT):
the block injector test is failing, however.

rbuysse (Mon, 14 Sep 2020 19:51:29 GMT):
`validator_1 | thread '' panicked at 'BatchInjectorFactory has no method 'create_injectors': PyErr { ptype: , pvalue: Some(ModuleNotFoundError("No module named 'sawtooth_validator.journal.injectors'",)), ptraceback: Some() }', src/journal/publisher.rs:214:14 validator_1 | note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace validator_1 | fatal runtime error: failed to initiate panic, error 5`

rbuysse (Mon, 14 Sep 2020 19:51:29 GMT):
``` validator_1 | thread '' panicked at 'BatchInjectorFactory has no method 'create_injectors': PyErr { ptype: , pvalue: Some(ModuleNotFoundError("No module named 'sawtooth_validator.journal.injectors'",)), ptraceback: Some() }', src/journal/publisher.rs:214:14 validator_1 | note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace validator_1 | fatal runtime error: failed to initiate panic, error 5 ```

rbuysse (Mon, 14 Sep 2020 19:52:35 GMT):
when I add an __init__.py file to `validator/sawtooth_validator/journal/injectors/`, it fails with this:

rbuysse (Mon, 14 Sep 2020 19:52:35 GMT):
when I add an __init__.py file to `validator/sawtooth_validator/journal/injectors/`, devmode fails with this:

rbuysse (Mon, 14 Sep 2020 19:53:14 GMT):
``` validator_1 | writing file: /etc/sawtooth/keys/validator.priv validator_1 | writing file: /etc/sawtooth/keys/validator.pub devmode_1 | INFO | devmode_engine_rust: | Wait time: 0 validator_1 | Generated config-genesis.batch settings-tp_1 | INFO | settings_tp:95 | Console logging level: DEBUG validator_1 | Processing config-genesis.batch... validator_1 | Processing config.batch... validator_1 | Generating /var/lib/sawtooth/genesis.batch settings-tp_1 | INFO | sawtooth_sdk::proces | connecting to endpoint: tcp://validator:4004 settings-tp_1 | INFO | sawtooth_sdk::proces | sending TpRegisterRequest: sawtooth_settings 1.0 validator_1 | [2020-09-14 19:34:10.127 WARNING (unknown file)] [src/pylogger.rs: 40] Started logger at level INFO settings-tp_1 | INFO | sawtooth_sdk::proces | Message: d7a02186b57c4d00a5ce6719dd9a602a devmode_1 | INFO | sawtooth_sdk::messag | Received Disconnect settings-tp_1 | INFO | settings_tp::handler | Setting "sawtooth.settings.vote.authorized_keys" changed to "03d59fb117cf1c6d8653c27c7b09ffe5a6b08855683f4ce35bd53d6fdafe73a870" devmode_1 | thread 'main' panicked at 'Failed to initialize: ReceiveError("DisconnectedError")', src/engine.rs:72:14 devmode_1 | note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace block-info-tp_1 | INFO | block_info_tp:97 | Console logging level: DEBUG block-info-tp_1 | INFO | sawtooth_sdk::proces | connecting to endpoint: tcp://validator:4004 validator_1 | [2020-09-14 19:34:10.454 INFO path] Skipping path loading from non-existent config file: /etc/sawtooth/path.toml settings-tp_1 | INFO | sawtooth_sdk::proces | TP_PROCESS_REQUEST sending TpProcessResponse: OK block-info-tp_1 | INFO | sawtooth_sdk::proces | sending TpRegisterRequest: block_info 1.0 settings-tp_1 | INFO | sawtooth_sdk::proces | Message: daf132c5a0eb4459adda18660ef9b9ee intkey-tp-python_1 | [2020-09-14 19:34:10.516 INFO core] register attempt: OK validator_1 | [2020-09-14 19:34:10.455 INFO validator] Skipping validator config loading from non-existent config file: /etc/sawtooth/validator.toml block-info-tp_1 | INFO | sawtooth_sdk::messag | Received Disconnect block-info-tp_1 | DEBUG | sawtooth_sdk::messag | Exited stream block-info-tp_1 | INFO | sawtooth_sdk::proces | Trying to Reconnect block-info-tp_1 | INFO | sawtooth_sdk::proces | connecting to endpoint: tcp://validator:4004 block-info-tp_1 | DEBUG | zmq:489 | socket dropped block-info-tp_1 | INFO | sawtooth_sdk::proces | sending TpRegisterRequest: block_info 1.0 validator_1 | [2020-09-14 19:34:10.455 INFO keys] Loading signing key: /etc/sawtooth/keys/validator.priv latest_devmode_1 exited with code 101 ```

jorgeRodriguez (Tue, 15 Sep 2020 02:10:59 GMT):
Has joined the channel.

MicaelFerreira (Wed, 16 Sep 2020 13:31:49 GMT):
Thanks @rbuysse for your help, PR is now ready to be reviewed. I would like some reviews now, please take a look https://github.com/hyperledger/sawtooth-core/pull/2331 reviewers

Dan (Wed, 16 Sep 2020 18:26:02 GMT):
@ltseeley it looks like raft build has been broken for the last month. Something to do with test dynamic membership https://build.sawtooth.me/job/Sawtooth-Hyperledger/job/sawtooth-raft/view/default/builds https://build.sawtooth.me/job/Sawtooth-Hyperledger/job/sawtooth-raft/job/master/563/execution/node/52/log/ I'm sure you have bigger fish to fry, but thought I'd point it out. I have a PR blocked on this and thought I'd look for quick fixes but the membership test is not a quick read for me.

MicaelFerreira (Thu, 17 Sep 2020 10:15:29 GMT):
Just a question to the code owners, there are some modules in python that are not used any more right? For example the module at /journal/validation_rule_enforcer.py, i wonder what is this code doing here, because the same is implemented in rust and is actually the one being used. Please correct me if i'm wrong

MicaelFerreira (Thu, 17 Sep 2020 10:39:40 GMT):
in sequence, if my previous statement is correct, the validator/tests/test_validator_rule_enforcer is testing something that is not used

ltseeley (Thu, 17 Sep 2020 14:05:55 GMT):
@MicaelFerreira that's correct. The components you mentioned were recently re-written in Rust. It's an ongoing effort to convert everything from Python -> Rust.

ltseeley (Thu, 17 Sep 2020 14:06:59 GMT):
@Dan I'll add that to my backlog :slightly_smiling_face:

MicaelFerreira (Thu, 17 Sep 2020 14:34:13 GMT):
Great, I just mentioned this because the sawtooth-core architecture is not easy to read and when someone is not entirely into the architecture (like me) we start to confuse things up =)

ltseeley (Thu, 17 Sep 2020 14:37:59 GMT):
That's definitely something we're working on improving!

MicaelFerreira (Fri, 18 Sep 2020 13:31:41 GMT):
Can someone just clarify me something about the integration tests: I have only one test being done, and after run pose it shows "Ran 1 test in 0.465s ok", but the coverage is not 100%, it has "135 Stmts and 36 Miss", what are those?

MicaelFerreira (Mon, 21 Sep 2020 13:10:31 GMT):
https://github.com/hyperledger/sawtooth-core/pull/2331 re-pushed with integration tests, reviewers please take a look

MicaelFerreira (Tue, 22 Sep 2020 17:26:37 GMT):
Got this error at the rest-api: ```[2020-09-22 17:24:00.064 DEBUG route_handlers] Sending CLIENT_BATCH_STATUS_REQUEST request to validator [2020-09-22 17:24:02.227 DEBUG state_delta_subscription_handler] Received event a23ebb2a: 3 changes [2020-09-22 17:24:02.229 DEBUG state_delta_subscription_handler] Updating 3 subscribers [2020-09-22 17:24:02.230 WARNING selector_events] socket.send() raised exception. [2020-09-22 17:24:02.246 DEBUG route_handlers] Received CLIENT_BATCH_STATUS_RESPONSE response from validator with status OK [2020-09-22 17:24:02.256 INFO helpers] GET /batch_statuses?id=4c9d375a7a189e1c97fea09ddcb59c4a8e7b6aadec57a488da65e8664037ccf013e6da00cb7482f9219055aec7b7033020e612dbea972d71e73e0637694093e2&wait=3 HTTP/1.1: 200 status, 610 size, in 2.192721 s [2020-09-22 17:24:12.978 ERROR web_protocol] Error handling request Traceback (most recent call last): File "/usr/lib/python3/dist-packages/aiohttp/web_protocol.py", line 231, in data_received messages, upgraded, tail = self._request_parser.feed_data(data) File "aiohttp/_http_parser.pyx", line 295, in aiohttp._http_parser.HttpParser.feed_data aiohttp.http_exceptions.BadStatusLine: invalid HTTP method ```

MicaelFerreira (Tue, 22 Sep 2020 17:30:02 GMT):
rest-api recovered

MicaelFerreira (Tue, 22 Sep 2020 17:30:02 GMT):
rest-api recovered by itself

arsulegai (Sun, 27 Sep 2020 12:21:09 GMT):
^ This is a pending ask from the community to port the Rest API component to the Rust language. @MicaelFerreira

MicaelFerreira (Mon, 28 Sep 2020 09:10:15 GMT):
Need more reviews on this PR, https://github.com/hyperledger/sawtooth-core/pull/2331, please take a look

MicaelFerreira (Wed, 30 Sep 2020 16:57:21 GMT):
Hey reviewers, i had to fix a test running twice pointed out by agunde on the PR https://github.com/hyperledger/sawtooth-core/pull/2331, please re-check it

MicaelFerreira (Wed, 30 Sep 2020 16:57:21 GMT):
Hey reviewers, i had to fix a test running twice pointed out by @agunde on the PR https://github.com/hyperledger/sawtooth-core/pull/2331, please re-check it

amundson (Fri, 02 Oct 2020 14:16:10 GMT):
Reminder, the contributor meeting is at 10am US/Central - in about 45 minutes -- the zoom URL is https://us02web.zoom.us/j/89262216179?pwd=UzFyK2t6MmxGTkxwR2dydnBkaUtDQT09 and agenda is at https://docs.google.com/document/d/1WlF8UfoKAEydJobHWDvz-tMpWqQSTlChoFv50fHX1tM/edit

wkatsak (Fri, 02 Oct 2020 14:26:44 GMT):
Hey, @amundson I’ve been in the waiting room for a while now

wkatsak (Fri, 02 Oct 2020 14:26:49 GMT):
Is the meeting going on?

wkatsak (Fri, 02 Oct 2020 14:27:12 GMT):
Ack, never mind. Timezone difference

arsulegai (Fri, 02 Oct 2020 15:19:16 GMT):
Meeting invite is updated in the Hyperledger's public calendar https://us02web.zoom.us/j/89262216179?pwd=UzFyK2t6MmxGTkxwR2dydnBkaUtDQT09

MicaelFerreira (Tue, 06 Oct 2020 16:35:36 GMT):
Hi, what is this warning at the catching up process? Non-deterministic issue? ```[2020-10-06 16:34:17.199 WARNING (unknown file)] [src/journal/block_validator.rs: 284] Error during block validation: BlockValidationError("During validate_on_chain_rules, error creating settings view: NotFound(\"be3c5582f7bad9229c0809a202ecdff72d7142e00a5201c44ce5ade0ad34684c\")") [2020-10-06 16:34:17.365 WARNING (unknown file)] [src/journal/block_validator.rs: 284] Error during block validation: BlockValidationError("During validate_on_chain_rules, error creating settings view: NotFound(\"9a4d1201b6373887c04e2d251856180cd8e153378f4dc3d28d3478c925030b72\")") ```

danintel (Tue, 06 Oct 2020 18:53:49 GMT):
Has left the channel.

arsulegai (Mon, 12 Oct 2020 17:03:36 GMT):
@MicaelFerreira could this be the same issue that was discussed few days ago?

csunitha (Thu, 22 Oct 2020 10:05:29 GMT):
Has joined the channel.

kenty (Thu, 29 Oct 2020 07:49:59 GMT):
Hello everyone, my name is Kent! Arun has asked me to markdown something from the FAQ folder and I have chosen consensus.rst as my starting point. I am also coordinating with Mr Jin in China and I will translate some/all the docs into Chinese.

kenty (Thu, 29 Oct 2020 07:49:59 GMT):
Hello everyone, my name is Kent! Arun has asked me to markdown something from the FAQ folder and I have chosen consensus.rst as my starting point. I am also coordinating with Mr Jin in China and I will translate some/all the docs into Chinese. https://github.com/hyperledger/sawtooth-docs/blob/refresh/faq/consensus.rst

kenty (Thu, 29 Oct 2020 07:49:59 GMT):
Hello everyone, my name is Kent! Arun has asked me to markdown something from the FAQ folder and I have chosen consensus.rst as my starting point. I am also coordinating with Mr Jin in China and I will translate some/all of the docs into Chinese. https://github.com/hyperledger/sawtooth-docs/blob/refresh/faq/consensus.rst

amundson (Thu, 29 Oct 2020 19:27:03 GMT):
Reminder, there is a contributor meeting tomorrow at 10am US/Central -- the zoom is https://us02web.zoom.us/j/89262216179?pwd=UzFyK2t6MmxGTkxwR2dydnBkaUtDQT09 and agenda is at https://docs.google.com/document/d/1WlF8UfoKAEydJobHWDvz-tMpWqQSTlChoFv50fHX1tM/edit

SimonCritchley (Sat, 31 Oct 2020 12:19:54 GMT):
Has joined the channel.

arsulegai (Sun, 01 Nov 2020 15:01:02 GMT):
@amundson having a separate branch for language translation makes sense to start with. Let's have @kenty and Mr Jin raise the current translation they have to the branch. Alternate could be to have a hyperledger-labs project and I am happy to sponsor it. Including help in technical matters if any for build and test. Note that other projects have followed this approach.

crypto_beep (Wed, 04 Nov 2020 17:02:37 GMT):
Has joined the channel.

omerporze (Mon, 09 Nov 2020 19:38:24 GMT):
Has joined the channel.

Vikash2601 (Tue, 10 Nov 2020 04:44:03 GMT):
Has joined the channel.

RajaramKannan (Wed, 18 Nov 2020 03:15:20 GMT):
the validator on one of the nodes crashed today (after being up for ~140 days), the only thing we can see is an exception a few seconds before it crashed "Exception in callback _AsyncSocket._handle_send() handle: Traceback (most recent call last): File "/usr/lib/python3.5/asyncio/events.py", line 125, in _run self._callback(*self._args) File "/usr/lib/python3/dist-packages/zmq/eventloop/future.py", line 280, in _handle_send f.set_result(result) File "/usr/lib/python3.5/asyncio/futures.py", line 332, in set_result self._schedule_callbacks() File "/usr/lib/python3.5/asyncio/futures.py", line 236, in _schedule_callbacks callbacks = self._callbacks[:] AttributeError: 'Future' object has no attribute '_callbacks'". It crashed a couple of seconds after, any ideas on why this might have happened? we are on 1.1.15

crypto_beep (Wed, 18 Nov 2020 15:44:21 GMT):
Hi Greetings, I didn't find any Commands to list all Validators Nodes and list all Genesis Nodes in doc or wiki how to Hyperledger Sawtooth v1.2.4 in Docker approach. could you guys please help me out on this.Thanks!

omerporze (Thu, 19 Nov 2020 20:53:45 GMT):
Anyone here that can help me setup Seth with more than 1 validator?

omerporze (Thu, 19 Nov 2020 20:53:45 GMT):
Hi, is there anyone here that can help me setup Seth with more than 1 validator?

arsulegai (Fri, 27 Nov 2020 16:01:21 GMT):
Do we have Sawtooth Core Working Session call today?

MicaelFerreira (Fri, 04 Dec 2020 15:17:41 GMT):
Hi, i had an error at the rest-api that i never saw before, restarting the network solved, but i wonder what could cause it ```[2020-12-04 14:38:33.072 DEBUG route_handlers] Sending CLIENT_BLOCK_LIST_REQUEST request to validator [2020-12-04 14:38:33.462 ERROR state_delta_subscription_handler] Unable to fetch latest block id [2020-12-04 14:38:33.462 DEBUG route_handlers] Received CLIENT_BLOCK_LIST_RESPONSE response from validator with status NOT_READY [2020-12-04 14:38:33.463 INFO helpers] GET /blocks?limit=1 HTTP/1.1: 503 status, 385 size, in 0.391587 s [2020-12-04 14:38:33.464 DEBUG state_delta_subscription_handler] Starting subscriber from [2020-12-04 14:38:33.465 ERROR state_delta_subscription_handler] unable to subscribe! [2020-12-04 14:38:33.465 DEBUG state_delta_subscription_handler] Sending initial most recent event to new subscriber [2020-12-04 14:38:33.465 DEBUG state_delta_subscription_handler] Subscribing to state delta events [2020-12-04 14:38:58.215 DEBUG state_delta_subscription_handler] Sending initial most recent event to new subscriber ```

wkatsak (Mon, 14 Dec 2020 18:53:38 GMT):
Hello everyone. We at Taekion have discovered a failure mode in Sawtooth/PBFT that persists to the latest release (1.2.6). The application that we are developing has a pattern of a stream of transactions. Each of many particular contexts will each have their own stream. Within a stream of transactions, each transaction will be explicitly declared as dependent on the transaction before it. What happens in our testing is that eventually, the cluster will lock up hard, where a transaction will get stuck in PENDING, and all subsequent transactions will either stay pending or get explicitly rejected as missing a dependency. The cluster can be restarted, but anything that was in pending will be lost, and we will have to resubmit them. The big issue is that we can make this happen quite reliably with our application. We have been chasing this for about a month internally.

wkatsak (Mon, 14 Dec 2020 18:54:20 GMT):
When it is stuck, we cannot even resubmit the stuck transaction, because it still shows as PENDING, and thus the system will not take it as a resubmit.

wkatsak (Mon, 14 Dec 2020 18:55:38 GMT):
I've managed to build a tester that replicates the issue WITHOUT any of our proprietary app. My tester constructs a series of intkey transactions, with dependencies between each, and submits them in batchlists, with one transactions per batch.

wkatsak (Mon, 14 Dec 2020 18:56:21 GMT):
I've just verified that I can replicate the issue as I said, with v1.2.6. My testing environment is a docker-compose environment on my workstation with 5 validators.

wkatsak (Mon, 14 Dec 2020 18:57:40 GMT):
The testing code is at https://github.com/taekion-org/Sawtooth-intkey-stress-test. You will want to run this tool in "sync" mode.

wkatsak (Mon, 14 Dec 2020 18:57:54 GMT):
I also have a docker published at https://hub.docker.com/repository/docker/taekion/intkey-stress-test

wkatsak (Mon, 14 Dec 2020 18:58:50 GMT):
@amundson @jamesbarry @kodonnel

wkatsak (Mon, 14 Dec 2020 21:14:47 GMT):
Just a followup. On my workstation, I run the test as follows: `intkey-stress-test sync --max_submit 250 --max_pending 1000`.

wkatsak (Mon, 14 Dec 2020 21:15:36 GMT):

wkatsak - Mon Dec 14 2020 16:15:26 GMT-0500 (Eastern Standard Time).txt

wkatsak (Mon, 14 Dec 2020 21:15:36 GMT):

wkatsak - Mon Dec 14 2020 16:15:26 GMT-0500 (Eastern Standard Time).txt

amundson (Tue, 15 Dec 2020 17:31:20 GMT):
@wkatsak cool that you have it reduced to a repeatable test case. how long do you need to run the tool to get the error condition?

wkatsak (Tue, 15 Dec 2020 17:59:42 GMT):
@amundson It varies. Usually within a few hours it will get stuck. Sometimes, it is a matter of minutes.

wkatsak (Tue, 15 Dec 2020 18:00:00 GMT):
You can recover the cluster by restarting it, but it seems to get stuck faster after its been stuck once.

wkatsak (Tue, 15 Dec 2020 18:00:14 GMT):
That part is only an impression though, I don't have hard data.

wkatsak (Tue, 15 Dec 2020 18:01:05 GMT):
When it is stuck, the throttling mechanism will be triggered, and no batches can come in.

wkatsak (Tue, 15 Dec 2020 18:01:45 GMT):
At least on the validator that I am submitting through.

LeonardoCarvalho (Sat, 26 Dec 2020 19:03:49 GMT):
I am getting some fun reading the Sawtooth LMDB files in Java, where can I get a description of the data transformation to/from lmdb.rs layer?

erivlis (Wed, 13 Jan 2021 00:17:57 GMT):
Has joined the channel.

amundson (Thu, 21 Jan 2021 18:02:34 GMT):
Hi reminder that the next Sawtooth Contributor Session is tomorrow at 10am US/Central. The zoom link is https://us02web.zoom.us/j/89262216179?pwd=UzFyK2t6MmxGTkxwR2dydnBkaUtDQT09 and the agenda/working doc is at https://docs.google.com/document/d/1WlF8UfoKAEydJobHWDvz-tMpWqQSTlChoFv50fHX1tM/edit#

arsulegai (Thu, 21 Jan 2021 18:43:35 GMT):
@kenty please join this call. We can discuss over the PR you raised.

kenty (Fri, 22 Jan 2021 11:42:06 GMT):
Thanks Arun, I have received some feedback on my markdown conversion and will be cleaning my code ASAP. Catch you all at the meeting :)

arsulegai (Fri, 22 Jan 2021 17:27:45 GMT):
@jsmitchell I didn't understand the topic of determinism in the meeting. Maybe it was me having sleepy eyes on a Friday night. Can you please give me a reference to it on other projects?

amundson (Fri, 22 Jan 2021 17:55:36 GMT):
basically, if you are running inside the wasm container, it is fairly easy to get some level of determinism because you are naturally limited by what you can do. you can't do non-deterministic things like access external resources. however, to be fully deterministic, you need to solve the halting problem as well -- which means you have to guarantee that the execution of the smart contract will end. ethereum's approach to this is to limit the number instructions that can be executed based on gas (each instruction takes some gas, there is a limit to the amount of gas available).

amundson (Fri, 22 Jan 2021 17:56:56 GMT):
if you ignore the halting problem, sabre already does a pretty good job in this area when compared to TPs because TPs can "do anything" including accessing external resources (that might change) or random number generation.

arsulegai (Fri, 22 Jan 2021 18:02:53 GMT):
Got it. Thanks for the clarification.

amundson (Thu, 18 Feb 2021 16:44:35 GMT):
Reminder that the next Sawtooth Contributor Session is tomorrow at 10am US/Central. The zoom link is https://us02web.zoom.us/j/89262216179?pwd=UzFyK2t6MmxGTkxwR2dydnBkaUtDQT09 and the agenda/working doc is at https://docs.google.com/document/d/1WlF8UfoKAEydJobHWDvz-tMpWqQSTlChoFv50fHX1tM/edit# -- feel free to add agenda items if you have them

wkatsak (Thu, 18 Feb 2021 23:22:16 GMT):
Thanks @amundson I'll be on the call. We have been running around like crazy here with product work, but we are still chasing the Sawtooth bug that I reported a month or two ago.

wkatsak (Thu, 18 Feb 2021 23:23:34 GMT):
Just to notate it before the discussion tomorrow, it usually gets triggered when the system is presented with a high volume of DEPENDENT transactions, e.g. a collection of transaction chains where each transaction has an explicit dependency on the transaction before.

wkatsak (Thu, 18 Feb 2021 23:25:00 GMT):
We see it running PBFT. When it happens, usually it is preceded by the validator returning an error to a call from PBFT, at which point PBFT drifts off the rails.

wkatsak (Thu, 18 Feb 2021 23:25:09 GMT):
The end result is that transactions stop.

romanmoz (Thu, 18 Feb 2021 23:38:35 GMT):
Has joined the channel.

amundson (Fri, 19 Feb 2021 14:36:18 GMT):
@wkatsak is the client re-submitting transactions if they get dropped?

arsulegai (Fri, 19 Feb 2021 15:11:31 GMT):
Waiting for the host to start meeting

agunde (Fri, 19 Feb 2021 15:12:30 GMT):
@arsulegai the meeting is not for another hour

arsulegai (Fri, 19 Feb 2021 15:13:28 GMT):
Oops! Right, with daylight savings it'll be at this time

arsulegai (Fri, 19 Feb 2021 17:15:45 GMT):
For mentorship program, submit proposals here https://wiki.hyperledger.org/display/INTERN/Cactus-samples+-+Business+Logic+Plugins+for+Hyperledger+Cactus

jmbarry (Fri, 19 Feb 2021 17:28:19 GMT):
If you are interested in giving a talk on Sawtooth for the Hyperledger Global Forum, I have set up a Slack Forum off of our Slack. Please send james@taekion.com your email and I will add you. Thanks!

amundson (Fri, 19 Feb 2021 17:29:39 GMT):
can't we keep that discussion here?

jmbarry (Fri, 19 Feb 2021 17:30:28 GMT):
Sure if you would like to we certainly can.

jmbarry (Fri, 19 Feb 2021 17:31:48 GMT):
I think there should be a Topic on Sawtooth 1.2 - how to use, Sawtooth Roadmap, Kent and his Sawtooth on the Raspberry Pi, Perhaps some commercial uses, Transact, Grid and others?

arsulegai (Fri, 19 Feb 2021 17:37:25 GMT):
I guess there should be talks like "Setup your first Sawtooth app in 5 steps", "Top 5 concepts that stands out Sawtooth from the rest", "Cool projects built on Sawtooth", "Best of design patterns for Sawtooth" etc.

arsulegai (Fri, 19 Feb 2021 17:42:31 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=CoP9JaaGmsCejRZct) This could be an avenue for experimentation, in addition to getting things built from scratch in labs. I like the idea Shawn proposed, we should put it in.

LeonardoCarvalho (Sat, 20 Feb 2021 12:28:17 GMT):
I am interested in replace the Validator Interconnect with a Java component, where can I find documentation or code on the communication with the other components?

amundson (Mon, 22 Feb 2021 18:07:22 GMT):
If you are re-implementing 1.x, then the code/protobuf in the 1-2 or 1-3 branches. if you are targeting 2.x compat. then libsplinter's transport module.

LeonardoCarvalho (Tue, 23 Feb 2021 10:54:18 GMT):
cool!

gilescope (Thu, 25 Feb 2021 17:27:04 GMT):
Has joined the channel.

amundson (Fri, 26 Feb 2021 18:07:15 GMT):
@jmbarry re:HGF do you know if there is still a fee this year like past years? I just remembered why we've never presented there.

arsulegai (Sat, 27 Feb 2021 07:27:24 GMT):
@amundson fee is for attending. This year it shows to be $50 per attendee. Speakers, program committee, sponsors (limited seats) are given free coupon codes.

LeonardoCarvalho (Mon, 01 Mar 2021 11:28:27 GMT):
I am getting this message on the rust validator devmode 0.1 : "Removed 5 incomplete batches from the schedule", how can I track the cause for that? My TP got the TP_STATE_SET_RESPONSE for the 5 SET operations, but the message appears anyway

LeonardoCarvalho (Mon, 01 Mar 2021 11:31:22 GMT):
docker images from 1.2.6

arsulegai (Mon, 01 Mar 2021 14:56:04 GMT):
I assume that message is printing at the time of creating candidate block. Incomplete batches for that ask by consensus engine (that round of consensus) is removed. Maybe consensus engine asked to finalize the blocks before these were executed. Nothing to worry about, do you see log traces that indicate this flow?

arsulegai (Mon, 01 Mar 2021 14:56:04 GMT):
I assume that message is printed at the time of creating candidate block. Incomplete batches for that ask by consensus engine (that round of consensus) is removed. Maybe consensus engine asked to finalize the block before these batches were executed. Nothing to worry about, do you see log traces that indicate this flow?

arsulegai (Mon, 01 Mar 2021 14:56:26 GMT):
* printing -> printed

LeonardoCarvalho (Wed, 03 Mar 2021 11:37:49 GMT):
Today I tried again, and: st-validator_1_2_6 | [2021-03-03 11:30:24.679 DEBUG scheduler_serial] Removed 5 incomplete batches from the schedule <....> 5 minutes later <...> st-devmode-engine-rust_1_2_4 | thread 'main' panicked at 'Failed to summarize block: ReceiveError("TimeoutError")', src/engine.rs:87:23 st-devmode-engine-rust_1_2_4 | stack backtrace: st-devmode-engine-rust_1_2_4 | 0: 0x557c338f3205 - st-devmode-engine-rust_1_2_4 | 1: 0x557c339135ec - st-devmode-engine-rust_1_2_4 | 2: 0x557c338f0a13 - st-devmode-engine-rust_1_2_4 | 3: 0x557c338f5980 - st-devmode-engine-rust_1_2_4 | 4: 0x557c338f56cc - st-devmode-engine-rust_1_2_4 | 5: 0x557c338f5fb7 - st-devmode-engine-rust_1_2_4 | 6: 0x557c338f5bbb - st-devmode-engine-rust_1_2_4 | 7: 0x557c339129f1 - st-devmode-engine-rust_1_2_4 | 8: 0x557c33912813 - st-devmode-engine-rust_1_2_4 | 9: 0x557c33817d2f - st-devmode-engine-rust_1_2_4 | 10: 0x557c3381dc2b - st-devmode-engine-rust_1_2_4 | 11: 0x557c3382b2e6 - st-devmode-engine-rust_1_2_4 | 12: 0x557c3381cda3 - st-devmode-engine-rust_1_2_4 | 13: 0x557c338f6388 - st-devmode-engine-rust_1_2_4 | 14: 0x557c3382b7c2 - st-devmode-engine-rust_1_2_4 | 15: 0x7f109f5f9b97 - __libc_start_main st-devmode-engine-rust_1_2_4 | 16: 0x557c3381223a - st-devmode-engine-rust_1_2_4 | 17: 0x0 - st-devmode-engine-rust_1_2_4 exited with code 101

LeonardoCarvalho (Wed, 03 Mar 2021 11:38:43 GMT):
even with RUST_BACKTRACE=full, no extra information

LeonardoCarvalho (Wed, 03 Mar 2021 11:40:08 GMT):
I have only one node of each type, can be that an issue?

amundson (Fri, 05 Mar 2021 14:34:19 GMT):
@LeonardoCarvalho try building in non-release mode (remove --release from cargo) to get a better backtrace

LeonardoCarvalho (Tue, 09 Mar 2021 10:19:57 GMT):

Clipboard - March 9, 2021 7:19 AM

LeonardoCarvalho (Tue, 09 Mar 2021 10:20:02 GMT):
I've tried, but I am getting this failure:

arsulegai (Tue, 09 Mar 2021 12:27:43 GMT):
@jmbarry @amundson we will have to submit abstract by 12th

amundson (Tue, 09 Mar 2021 14:55:55 GMT):
@LeonardoCarvalho I think that's the error when protobuf is missing (required in build.rs)

LeonardoCarvalho (Wed, 10 Mar 2021 11:29:53 GMT):
hm, ok, so how do I build that ?

LeonardoCarvalho (Wed, 10 Mar 2021 11:29:59 GMT):
I really miss a manual or howto...

LeonardoCarvalho (Wed, 10 Mar 2021 12:02:19 GMT):
Allas, as a constructive criticism, I think that the creation of language-agnostic documentation would help a lot Sawtooth's adoption in some companies.

Helen_Garneau (Wed, 10 Mar 2021 14:56:10 GMT):
Has joined the channel.

Helen_Garneau (Wed, 10 Mar 2021 14:56:45 GMT):
Hello Sawtooth Developers! Reminder to please join the DevRel Marketing Committee call at 9am PT today. Take a look at the agenda and add items if you'd like here: https://wiki.hyperledger.org/x/Nqx6Ag (note new Zoom info)

jmbarry (Fri, 12 Mar 2021 21:40:46 GMT):
Has anyone submitted sessions for Sawtooth? ANything that we can chain together for a "Sawtooth Track?" My company is two of us coding at this point in time, and we have been stymied at getting any topics that make sense for this conference.

jmbarry (Fri, 12 Mar 2021 21:43:08 GMT):
@arsulegai had put a lot of suggestions for topics for the Hyperledger Global Forum https://www.hyperledger.org/event/hyperledger-global-forum-2021 Topic on * Sawtooth 1.2 - how to use * Sawtooth Roadmap, Kent and his team * Sawtooth on the Raspberry Pi * Sawtooth commercial uses * Transact, * Grid * Setup your first Sawtooth app in 5 steps * Top 5 concepts that stands out Sawtooth from the rest * Transaction processor use cases * Cool projects built on Sawtooth * Best of design patterns for Sawtooth * Real life use case of setting up the Tel Aviv Stock exchange on Sawtooth. Duncan from Blockchain Technology Partners

jmbarry (Fri, 12 Mar 2021 21:43:50 GMT):
@amundson is Bitwise going to propose any sessions with an abstract?

amundson (Fri, 12 Mar 2021 21:52:21 GMT):
Cargill is planning to submit a talk on Grid

jmbarry (Fri, 12 Mar 2021 21:55:59 GMT):
OK any others? You guys have the real Sawtooth expertise....

amundson (Fri, 12 Mar 2021 22:00:30 GMT):
I think the Grid one will be good to showcase Sawtooth/Transact/Splinter tech but through more of a business lens.

amundson (Fri, 12 Mar 2021 22:01:48 GMT):
I was considering a Sawtooth 2 one but I haven't had an opportunity to write up a submission and not even enough time to successfully track down the template, etc.

jmbarry (Fri, 12 Mar 2021 22:16:44 GMT):
@amundson the templlate is here https://events.linuxfoundation.org/hyperledger-global-forum/program/cfp/

gilescope (Mon, 15 Mar 2021 18:19:23 GMT):
am curious why cpython was chosen over pyo3? was that because it was simpler and python won't be in the picture long term?

pschwarz (Mon, 15 Mar 2021 19:11:44 GMT):
At the time, pyo3 only compiled on Rust nightly, and we have set a rule that Sawtooth must compile on stable

gilescope (Tue, 16 Mar 2021 10:50:45 GMT):
ah ok. took a little work to get things working on stable but they got there last year. Am amazed how much progress they achieved last year to be honest.

gilescope (Tue, 16 Mar 2021 10:52:08 GMT):
I am guessing you're not after a pyo3 switchover pr.

LeonardoCarvalho (Thu, 18 Mar 2021 11:13:27 GMT):
ok, after some painful tcpdumping and debugging, I figured out that a couple merges I've entangled the context ids for the messages. Shouldn't the Validator valdiate context ids and send back a error message?

arsulegai (Thu, 18 Mar 2021 14:28:11 GMT):
[ ](https://chat.hyperledger.org/channel/sawtooth-core-dev?msg=rLZSoT6ru8Q68Rjyg) I was away from RocketChat for a while, saw this now. Tomorrow is the last day for submission. Did you submit a talk?

amundson (Thu, 18 Mar 2021 14:36:09 GMT):
I have not submitted one, I'm not sure about others. TBH, simply too stressful for me to present in a semi-public forum. Happy to support others that are more comfortable, including on content development.

amundson (Thu, 18 Mar 2021 20:46:45 GMT):
Reminder that the next Sawtooth Contributor Session is tomorrow at 10am US/Central. The zoom link is https://us02web.zoom.us/j/89262216179?pwd=UzFyK2t6MmxGTkxwR2dydnBkaUtDQT09 and the agenda/working doc is at https://docs.google.com/document/d/1WlF8UfoKAEydJobHWDvz-tMpWqQSTlChoFv50fHX1tM/edit# -- feel free to add agenda items if you have them

LeonardoCarvalho (Wed, 31 Mar 2021 11:44:15 GMT):
ok, managed to get a fully functional and stable transaction processor, now I will apply the Saint Exupéry Refactoring to remove the noise from going in and out of the coding... :)

amundson (Thu, 15 Apr 2021 13:57:11 GMT):
Reminder that the next Sawtooth Contributor Session is tomorrow at 10am US/Central. The zoom link is https://us02web.zoom.us/j/89262216179?pwd=UzFyK2t6MmxGTkxwR2dydnBkaUtDQT09 and the agenda/working doc is at https://docs.google.com/document/d/1WlF8UfoKAEydJobHWDvz-tMpWqQSTlChoFv50fHX1tM/edit# -- feel free to add agenda items if you have them

amundson (Fri, 16 Apr 2021 16:00:16 GMT):
Thanks all for the conversation - this is the book I referenced - https://www.springer.com/gp/book/9783642152597

arsulegai (Fri, 16 Apr 2021 16:01:50 GMT):
^ @jmbarry @wkatsak Shawn did a walkthrough of consensus library proposal.

gilescope (Thu, 22 Apr 2021 16:59:27 GMT):
I was looking over the validator rust code in main. Is it fair to say that at the moment that everything has a rust interface but that that interface calls over to the python module? (I didn't see any gossip or zmq related depenencies in Cargo.toml that got me wondering).

amundson (Fri, 30 Apr 2021 17:45:14 GMT):
@gilescope the general direction we are taking is to move all the functionality to libsawtooth, libtransact, libsplinter, etc. and the validator will just pull all of that together. python is there for components we haven't worked through yet. so for example, since libsplinter will provide the networking layer but isn't currently being used, the network stuff is largely still in python at the moment

lucgerrits (Wed, 05 May 2021 08:05:27 GMT):
Has joined the channel.

pushkarb (Wed, 26 May 2021 07:48:06 GMT):
Has joined the channel.

Dan (Wed, 26 May 2021 14:37:18 GMT):
anyone have a positive or negative experience with grpc auth features? https://grpc.io/docs/guides/auth/

pushkarb (Sun, 30 May 2021 15:39:27 GMT):
I need to modify the intkey transaction family for specific purpose. But I am unable to find the code for intkey in the sawtooth-core repository on github.

amundson (Tue, 01 Jun 2021 16:17:22 GMT):
@Dan no, but grpc generally we found performed exceptionally poorly in python, which is why it wasn't used initially

pushkarb (Fri, 11 Jun 2021 11:09:20 GMT):
Hello everyone, I need some help. I am trying to make changes to the intkey transaction family. I have cloned the sawtooth python sdk. The problem I am facing is that if I make changes to the intkey code present in `\sawtooth-sdk-python\sawtooth-sdk-python-main\examples\intkey_python\sawtooth_intkey\client_cli`, the changes aren't reflecting in the binaries that I am building using docker. What am I missing?

nage (Mon, 21 Jun 2021 17:29:53 GMT):
Has left the channel.

AbhijeetBH 2 (Sat, 03 Jul 2021 12:51:32 GMT):
Has joined the channel.

AbhijeetBH 2 (Sat, 03 Jul 2021 13:02:05 GMT):
Hello. Need help. Our Production Node stopped working with following error on 4 out of 10 nodes `[2021-07-03 12:15:34.266 INFO (unknown file)] [src/journal/block_validator.rs: 265] Block ea5fcd1fdd38365b47881c2511c325d6690adca26d4ba63ffad35a6a7b61bdff27d74c7d77716112d38ffa2fba6fbf9984f81d36721f05aa79ce99bfcb753c33 passed validation [2021-07-03 12:15:34.289 INFO ffi] [src/journal/chain.rs: 557] Failed block Block(id: ea5fcd1fdd38365b47881c2511c325d6690adca26d4ba63ffad35a6a7b61bdff27d74c7d77716112d38ffa2fba6fbf9984f81d36721f05aa79ce99bfcb753c33, block_num: 30710, state_root_hash: fa7631f2dfb7f674ec361e379a205c9909fb00ee919c11c65b5c0c30015ff5e9, previous_block_id: 33e03d392caa491a51a8262124ac2eb0ce340742cf63950a80905108b46ab6a16d6a84d7aba4aacd9ee17fbd0f30fa3a41ec2ddc26bfbf5f7848214fd62b42` These 4 nodes are on 30686 while rest of the 6 are on 30710. Now blockchain has stopped altogether.

arsulegai (Sun, 04 Jul 2021 05:25:12 GMT):
cc: @pschwarz @amundson

AbhijeetBH 2 (Sun, 04 Jul 2021 06:32:04 GMT):
@arsulegai After copying lmdb files from a node which was ahead, to the nodes which were failing validation, the bllockchain is again working now. But I think monitoring is required. We have lost 90% of transactions submitted as either failed block or unknown status

AbhijeetBH 2 (Mon, 05 Jul 2021 05:36:55 GMT):
Guys, once an invalid transaction is submitted to blockchain, it takes around 15 minutes and keeps retrying that invalid transaction. During this time if any new valid transaction is submitted, it stays in pending state. This is a worrisome situation since availability takes a hit just because of a single invalid transaction.

AbhijeetBH 2 (Mon, 05 Jul 2021 05:36:55 GMT):
Guys, once an invalid transaction is submitted to blockchain, it takes around 15 minutes and keeps retrying that invalid transaction. During this time if any new valid transaction is submitted, it stays in pending state. This is a worrisome situation since availability takes a hit just because of a single invalid transaction. What is a recommended way of suppressing this behaviour ?

arsulegai (Mon, 05 Jul 2021 18:43:34 GMT):
Abhijeet, you have 2 in your name. I am unable to tag you. Validator node will not block itself from receiving new requests. It will keep at least on transaction, but if it the only transaction then you see this error. Your validator should still receive newer transactions. One of the solution I have come across so far is to not fail a transaction but rather model your contract to know it is invalid by storing the status on chain. Another solution is to refresh the validator's runtime memory by restarting it. Note: A better solution to this, i.e. syncing the failed transaction status is added as a feature in Transact. Most probably in next version, you can expect a feature around this

AbhijeetBH 2 (Tue, 06 Jul 2021 05:51:12 GMT):
@arsulegai I am unable to change my name. I have been using the second solution so far i.e. restarting the all the validators. But I like the first solution i.e. storing the status. Still any runtime errors in Python contract still results invalid transaction.

AbhijeetBH 2 (Thu, 08 Jul 2021 16:38:36 GMT):
PBFT has been breaking regularly now on Production. A few nodes are always left behind. While a few keep adding blocks. I have no clue what's going on. Then after sometime, the node which is behind starts failing the blocks

AbhijeetBH 2 (Thu, 08 Jul 2021 16:40:04 GMT):
out of 10 nodes, 9 nodes have already committed the blocks, while one nodes keeps giving error.

AbhijeetBH 2 (Thu, 08 Jul 2021 16:40:41 GMT):

sawtooth-8689d6799c-2bqx5-pbftengine.txt

AbhijeetBH 2 (Thu, 08 Jul 2021 16:41:36 GMT):
These are the logs. @pschwarz @amundson

amundson (Thu, 08 Jul 2021 21:09:20 GMT):
Reminder that the next Sawtooth Contributor Session is tomorrow at 10am US/Central. The zoom link is https://us02web.zoom.us/j/89262216179?pwd=UzFyK2t6MmxGTkxwR2dydnBkaUtDQT09 and the agenda/working doc is at https://docs.google.com/document/d/1WlF8UfoKAEydJobHWDvz-tMpWqQSTlChoFv50fHX1tM/edit# -- feel free to add agenda items if you have them

Helen_Garneau (Tue, 13 Jul 2021 13:27:56 GMT):
Hello Sawtooth Developers! Reminder to please join the DevRel Marketing Committee call at 9am PT tomorrow- 7/14. Take a look at the agenda and add items if you'd like here: https://wiki.hyperledger.org/x/sANCAw

rafaelmelo (Thu, 29 Jul 2021 18:27:59 GMT):
Has joined the channel.

agunde (Thu, 05 Aug 2021 19:13:39 GMT):
Reminder that the next Sawtooth Contributor Session is tomorrow at 10am US/Central. The zoom link is https://us02web.zoom.us/j/88191563969?pwd=NkloR1VhQ2JLU1RQZW0vY2hHS1VOQT09 (different than normal link) and the agenda/working doc is at https://docs.google.com/document/d/1WlF8UfoKAEydJobHWDvz-tMpWqQSTlChoFv50fHX1tM/edit# -- feel free to add agenda items if you have them

AbhijeetBH 2 (Wed, 25 Aug 2021 08:09:30 GMT):
Hi all !! After struggling with solving this issue on production and taking a few measures, I am coming back to the forum in hope of getting help on following ticket. https://lists.hyperledger.org/g/sawtooth/message/852 I am also facing a similar issue with 2 out of 7 Nodes on production and this has brought the complete blockchain to halt. following is a detailed backtrace: `at src/libstd/sys_common/backtrace.rs:71 2: 0x7f62860e0d36 - std::panicking::default_hook::{{closure}}::he20974adbefcc046 at src/libstd/sys_common/backtrace.rs:59 at src/libstd/panicking.rs:197 3: 0x7f62860e0ac9 - std::panicking::default_hook::he4af6af4ac7fef7b at src/libstd/panicking.rs:211 4: 0x7f62860e143f - std::panicking::rust_panic_with_hook::h057ff03eb4c8000f at src/libstd/panicking.rs:474 5: 0x7f62860e0fc1 - std::panicking::continue_panic_fmt::ha6d6ae144369025b at src/libstd/panicking.rs:381 6: 0x7f62860e0ea5 - rust_begin_unwind at src/libstd/panicking.rs:308 7: 0x7f628610a03c - core::panicking::panic_fmt::hc4f83bfed80aeabd at src/libcore/panicking.rs:85 8: 0x7f6285d508ed - core::result::unwrap_failed::h4e6021f3814dea74 9: 0x7f6285f18a10 - core::ops::function::impls:: for &mut F>::call_once::h57b1d53639052476 10: 0x7f6285d56067 - <&mut I as core::iter::traits::iterator::Iterator>::next::ha31cc21dc9369295 11: 0x7f6285cadd8a - as alloc::vec::SpecExtend>::from_iter::had44e8f5062869bc 12: 0x7f6285f3b0ca - ::complete::he4a2710d77656596 13: 0x7f6285d8fc70 - sawtooth_validator::journal::block_validator::BlockValidationProcessor::validate_block::h42228c67dd79a36a 14: 0x7f6285f11084 - std::sys_common::backtrace::__rust_begin_short_backtrace::ha5884d17750b97f3 15: 0x7f6285f182ef - std::panicking::try::do_call::hd3adc0ff8af40e20 16: 0x7f62860f2409 - __rust_maybe_catch_panic at src/libpanic_unwind/lib.rs:85 17: 0x7f6285d6c976 - core::ops::function::FnOnce::call_once{{vtable.shim}}::heba166d4acd621f0 18: 0x7f62860c364e - as core::ops::function::FnOnce>::call_once::h805c3cc89d534c05 at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/liballoc/boxed.rs:704 19: 0x7f62860f10bf - std::sys::unix::thread::Thread::new::thread_start::h6f10b78f26c98dc6 at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/liballoc/boxed.rs:704 at src/libstd/sys_common/thread.rs:13 at src/libstd/sys/unix/thread.rs:79 20: 0x7f628a9ee6da - start_thread 21: 0x7f628a4ff88e - __clone 22: 0x0 - `

AbhijeetBH 2 (Wed, 25 Aug 2021 08:09:30 GMT):
Hi all !! After struggling with solving this issue on production and taking a few measures, I am coming back to the forum in hope of getting help on following ticket. https://lists.hyperledger.org/g/sawtooth/message/852 I am also facing a similar issue with 2 out of 7 Nodes on production and this has brought the complete blockchain to halt. following is a detailed backtrace: `[2021-08-25 08:07:56.942 ERROR ffi] [src/state/merkle_ffi.rs: 423] Address 10373e83d96f8ef69fe93be62180bee3e0b4fa49d1294b0c27c46f5671bed181, in deletions, was not found. thread '' panicked at 'No method get_batch_execution_result on python scheduler: PyErr { ptype: , pvalue: Some(KeyError('Value was not found',)), ptraceback: Some() }', src/libcore/result.rs:999:5 stack backtrace: 0: 0x7f62860e4b03 - std::sys::unix::backtrace::tracing::imp::unwind_backtrace::h6485381528590a55 at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:39 1: 0x7f62860dc7ab - std::sys_common::backtrace::_print::h49a82ae9552e35c7 at src/libstd/sys_common/backtrace.rs:71 2: 0x7f62860e0d36 - std::panicking::default_hook::{{closure}}::he20974adbefcc046 at src/libstd/sys_common/backtrace.rs:59 at src/libstd/panicking.rs:197 3: 0x7f62860e0ac9 - std::panicking::default_hook::he4af6af4ac7fef7b at src/libstd/panicking.rs:211 4: 0x7f62860e143f - std::panicking::rust_panic_with_hook::h057ff03eb4c8000f at src/libstd/panicking.rs:474 5: 0x7f62860e0fc1 - std::panicking::continue_panic_fmt::ha6d6ae144369025b at src/libstd/panicking.rs:381 6: 0x7f62860e0ea5 - rust_begin_unwind at src/libstd/panicking.rs:308 7: 0x7f628610a03c - core::panicking::panic_fmt::hc4f83bfed80aeabd at src/libcore/panicking.rs:85 8: 0x7f6285d508ed - core::result::unwrap_failed::h4e6021f3814dea74 9: 0x7f6285f18a10 - core::ops::function::impls:: for &mut F>::call_once::h57b1d53639052476 10: 0x7f6285d56067 - <&mut I as core::iter::traits::iterator::Iterator>::next::ha31cc21dc9369295 11: 0x7f6285cadd8a - as alloc::vec::SpecExtend>::from_iter::had44e8f5062869bc 12: 0x7f6285f3b0ca - ::complete::he4a2710d77656596 13: 0x7f6285d8fc70 - sawtooth_validator::journal::block_validator::BlockValidationProcessor::validate_block::h42228c67dd79a36a 14: 0x7f6285f11084 - std::sys_common::backtrace::__rust_begin_short_backtrace::ha5884d17750b97f3 15: 0x7f6285f182ef - std::panicking::try::do_call::hd3adc0ff8af40e20 16: 0x7f62860f2409 - __rust_maybe_catch_panic at src/libpanic_unwind/lib.rs:85 17: 0x7f6285d6c976 - core::ops::function::FnOnce::call_once{{vtable.shim}}::heba166d4acd621f0 18: 0x7f62860c364e - as core::ops::function::FnOnce
>::call_once::h805c3cc89d534c05 at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/liballoc/boxed.rs:704 19: 0x7f62860f10bf - std::sys::unix::thread::Thread::new::thread_start::h6f10b78f26c98dc6 at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/liballoc/boxed.rs:704 at src/libstd/sys_common/thread.rs:13 at src/libstd/sys/unix/thread.rs:79 20: 0x7f628a9ee6da - start_thread 21: 0x7f628a4ff88e - __clone 22: 0x0 - `

arsulegai (Thu, 26 Aug 2021 14:04:56 GMT):
@AbhijeetBH 2 which version of Sawtooth are you running?

arsulegai (Thu, 26 Aug 2021 14:05:18 GMT):
Check why does it say deletions entries are not found.

AbhijeetBH 2 (Mon, 30 Aug 2021 04:10:11 GMT):
I am running hyperledger/sawtooth-validator:chime How can I check that ? What are troubleshooting steps ?

AbhijeetBH 2 (Tue, 31 Aug 2021 05:21:38 GMT):

Screenshot from 2021-08-31 10-48-00.png

agunde (Fri, 03 Sep 2021 13:02:55 GMT):
Reminder that the next Sawtooth Contributor Session is today at 10am US/Central. The zoom link is https://us02web.zoom.us/j/89262216179?pwd=UzFyK2t6MmxGTkxwR2dydnBkaUtDQT09 and the agenda/working doc is at https://docs.google.com/document/d/1WlF8UfoKAEydJobHWDvz-tMpWqQSTlChoFv50fHX1tM/edit# -- feel free to add agenda items if you have them

isabeltomb (Fri, 03 Sep 2021 15:01:42 GMT):
Has joined the channel.

Dan (Thu, 16 Sep 2021 17:55:05 GMT):
is there anywhere I can read about rationale for moving docs to its own repo? I don't have an issue with it, just curious about the rationale.

amundson (Mon, 27 Sep 2021 15:56:37 GMT):
@Dan The rationale is essentially that assembling the docs from a bunch of separate repos is a failed experiment, and that treating documentation as a separate component completely (the website), it is easier to get good results.

RajaramKannan (Mon, 18 Oct 2021 18:11:07 GMT):
Hi @AbhijeetBH 2 did you get a resolution or a workaround to this? We are on 1.1.5 though and the error is similar althought not the same. Whenever we send a transaction the validator crashes....

RajaramKannan (Tue, 19 Oct 2021 05:50:45 GMT):
We are seeing all of a sudden a crash in the validators in our UAT environment (we are still on 1.1.5). We have narrowed down to the fact that when a single batch comes in, it works fine and doesnt crash. But when multiple batches come in (not even that many - say even 5-10), the validator crashes!

RajaramKannan (Tue, 19 Oct 2021 05:51:02 GMT):
stack trace - hread '' panicked at 'No method complete on python scheduler: PyErr { ptype: , pvalue: Some(KeyError('Value was not found',)), ptraceback: Some() }', src/libcore/result.rs:997:5 stack backtrace: 0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:39 1: std::sys_common::backtrace::_print at src/libstd/sys_common/backtrace.rs:70 2: std::panicking::default_hook::{{closure}} at src/libstd/sys_common/backtrace.rs:58 at src/libstd/panicking.rs:200 3: std::panicking::default_hook at src/libstd/panicking.rs:215 4: std::panicking::rust_panic_with_hook at src/libstd/panicking.rs:478 5: std::panicking::continue_panic_fmt at src/libstd/panicking.rs:385 6: rust_begin_unwind at src/libstd/panicking.rs:312 7: core::panicking::panic_fmt at src/libcore/panicking.rs:85 8: core::result::unwrap_failed 9: ::complete 10: sawtooth_validator::journal::candidate_block::CandidateBlock::summarize 11: sawtooth_validator::journal::publisher::BlockPublisher::summarize_block 12: block_publisher_summarize_block 13: ffi_call_unix64 14: ffi_call 15: _ctypes_callproc 16: 17: PyObject_Call 18: PyEval_EvalFrameEx 19: 20: PyEval_EvalCodeEx 21: 22: PyObject_Call 23: PyEval_EvalFrameEx 24: 25: PyEval_EvalFrameEx 26: 27: PyEval_EvalFrameEx 28: PyEval_EvalFrameEx 29: PyEval_EvalFrameEx 30: PyEval_EvalFrameEx 31: 32: PyEval_EvalCodeEx 33: 34: PyObject_Call 35: PyEval_EvalFrameEx 36: 37: PyEval_EvalCodeEx 38: 39: PyObject_Call 40: PyEval_EvalFrameEx 41: PyEval_EvalFrameEx 42: 43: PyEval_EvalCodeEx 44: 45: PyObject_Call 46: PyEval_EvalFrameEx 47: PyEval_EvalFrameEx 48: PyEval_EvalFrameEx 49: 50: PyEval_EvalCodeEx 51: 52: PyObject_Call 53: 54: PyObject_Call 55: PyEval_CallObjectWithKeywords 56: 57: start_thread 58: clone fatal runtime error: failed to initiate panic, error 5

RajaramKannan (Tue, 19 Oct 2021 05:51:27 GMT):
any troubleshooting tips/workarounds/tips will be much appreciated

RajaramKannan (Tue, 19 Oct 2021 05:52:09 GMT):
Note we saw a different error when it was in parallel scheduling setup, this is for serial - but I think the root cause is the same. Except we dont know the root cause

RajaramKannan (Tue, 19 Oct 2021 05:57:37 GMT):
Docker exit status is 139 i.e SigSEGV

arsulegai (Tue, 19 Oct 2021 10:10:13 GMT):
@RajaramKannan quick question: do you have both the add and delete operations in your smart contract as part of any transaction

arsulegai (Tue, 19 Oct 2021 10:10:13 GMT):
@RajaramKannan quick question: do you have both the add and delete operations in your smart contract as part of any transaction?

arsulegai (Tue, 19 Oct 2021 10:10:13 GMT):
@RajaramKannan quick question: do you have both the add and delete state operations in your smart contract as part of any transaction?

arsulegai (Tue, 19 Oct 2021 10:15:36 GMT):
Other things you can confirm, see which address does it claim is not found (is it pre-existing state or a new state). Is any of your batch/transaction trying to do something with it.

RajaramKannan (Tue, 19 Oct 2021 10:15:48 GMT):
I suspect we do, I'll check. We are narrowing down on if any specific transaction is causing it, but if you have anything we can check as a hypothesis, it will be helpful

arsulegai (Tue, 19 Oct 2021 10:17:52 GMT):
Since you're on older branch, there was an issue back then ~ it was to do with delete and add state operations happening in one go. Maybe you can draw up a tree structure, see if what you're adding is in the tree path what is getting deleted.

arsulegai (Tue, 19 Oct 2021 10:17:52 GMT):
Since you're on older branch, there was an issue back then ~ it was to do with delete and add state operations happening in one go. Maybe you can draw up a tree structure, see if what you're adding is in the tree path of what is getting deleted.

arsulegai (Tue, 19 Oct 2021 10:19:42 GMT):
The error log that I see here is different from what was observed earlier though. This is occurring at summarize block step? (a forking issue maybe?) which consensus algorithm are you on?

RajaramKannan (Tue, 19 Oct 2021 10:19:55 GMT):
We are on PBFT ...

RajaramKannan (Tue, 19 Oct 2021 10:20:49 GMT):
The above was after changing it to serial yesterday. When we had it as parallel we were seeing thread '' panicked at 'No method get_batch_execution_result on python scheduler: PyErr { ptype: , pvalue: Some(KeyError('Value was not found',)), ptraceback: Some() }', src/libcore/result.rs:997:5 note: Run with RUST_BACKTRACE=1 environment variable to display a backtrace. fatal runtime error: failed to initiate panic, error 5

RajaramKannan (Tue, 19 Oct 2021 10:20:59 GMT):
but I did not have the backtrace on at that point

RajaramKannan (Tue, 19 Oct 2021 10:22:09 GMT):
Unfortunately it was not showing which Key... (If that was an address for example)

arsulegai (Tue, 19 Oct 2021 10:38:47 GMT):
Need some logs before the crash, to understand what was the scenario. It appears to be like `summarize_block` is trying to get a state that is missing. Maybe prior log traces (including block IDs for reference) would help to know what's happening there

arsulegai (Tue, 19 Oct 2021 10:40:29 GMT):
@amundson @pschwarz @agunde have you seen this error before?

RajaramKannan (Tue, 19 Oct 2021 10:47:10 GMT):
There doesnt seem to be much useful in the log, let me get them in any case... We actually are also seeing this sometimes for only 1 transaction/batch apparently. I will post it shortly...

RajaramKannan (Tue, 19 Oct 2021 11:16:23 GMT):

crash-19-10-2021-1 copy.docx

RajaramKannan (Tue, 19 Oct 2021 11:50:06 GMT):

crash-19-10-2021-1 copy.docx

amundson (Tue, 19 Oct 2021 13:22:09 GMT):
"No method" means it tried to call that function on the python object and it wasn't there

amundson (Tue, 19 Oct 2021 13:23:15 GMT):
Has anything about the environment changed recently? (How you start things up, etc?)

RajaramKannan (Tue, 19 Oct 2021 14:03:43 GMT):
Hi @amud

RajaramKannan (Tue, 19 Oct 2021 14:07:21 GMT):
@amundson we did change it to auto restart if the container goes down, but nothing else. Also this issue happens even if the container is not auto - restarted.

RajaramKannan (Tue, 19 Oct 2021 14:09:51 GMT):
We noticed it seems to happen for a transaction where we are setting an address. Interestingly other transactions in the past should have already set that address and have not (not because the previous transactions did not try, but there was no failure and neither were the values set.). This is however circumstantial although we are able to currently isolate and replicate this behvaior, but the logs as you can see do not show any further details...

amundson (Fri, 29 Oct 2021 13:09:08 GMT):
FYI - the sawtooth meeting this morning is canceled. See you next time.

RajaramKannan (Fri, 29 Oct 2021 13:32:31 GMT):
@amundson @arsulegai just following up on our issue on the crash. Background, we have an account based model, where account holdings for each account are in a particular address in the tree. This is in the form of a simple JSON of the form {["USD":3000]} for example. We have a number of transactions that primarily update the holdings and send out a custom event. None of the transactions delete state at any address, they either set if not previously set or else update. We went backwards and reviewed all our daily LMDB backups as well as the logs prior to the crash. Interestingly, that particular address did have the data/state set up until the transaction that crashed. In the logs of the transaction where it crashed, we print the current data in the state and it was blank! And then it crashed. The transaction just prior to that was the same type of transaction and it got data from the state fine and went through... It almost feels like one specific address in LMDB suddenly got corrupted (not sure what the right term is!) and anytime any operation (maybe a set or update) on it is done now it crashes the validator in all nodes. Any ideas? We havent seen it yet in prod, but we need to figure out why it happened in our UAT environment so that we can try to avoid it in prod.

AbhijeetBH 2 (Mon, 01 Nov 2021 04:09:02 GMT):
Does for some reason you are sending payloads which has exact same hash signatures ? Ususally `nonce` field is used to variate hash signature between each transaction.

RajaramKannan (Mon, 01 Nov 2021 05:11:50 GMT):
We do generate a unique nonce, but I dont think this is the issue. This has happened after running maybe 100K txns overall and atleast several 100 that has set or updated that specific address location with no issues so far until it crashed for the first time. Any new transactions after that to that address location ends up crashing the validator. (And the TP logs show that there is no data there either, when it was there prior to the crash)

AbhijeetBH 2 (Tue, 02 Nov 2021 03:49:49 GMT):
So this is a new sort of error it seems. Since the issue I faced tends to get solved once I restart all the validator nodes and retry the transaction with a new nonce

RajaramKannan (Tue, 02 Nov 2021 07:14:46 GMT):
yes, first time we are seeing this and it has been up for 2 years continuously now!

Aljone (Tue, 09 Nov 2021 16:04:34 GMT):
Has joined the channel.

Aljone (Tue, 09 Nov 2021 16:05:12 GMT):
`const DEFAULT_BATCH_POOL_SAMPLE_SIZE: usize = 5; const DEFAULT_BATCH_POOL_INITIAL_SAMPLE_VALUE: usize = 30; const BATCH_POOL_MULTIPLIER: usize = 10;`

Aljone (Tue, 09 Nov 2021 16:08:07 GMT):
Hi Team, I have a question regarding the pending batch pool's limit. ` const DEFAULT_BATCH_POOL_SAMPLE_SIZE: usize = 5; const DEFAULT_BATCH_POOL_INITIAL_SAMPLE_VALUE: usize = 30; const BATCH_POOL_MULTIPLIER: usize = 10; ` Is there a way to configure the DEFAULT_BATCH_POOL_INITIAL_SAMPLE_VALUE in /sawtooth-lib/libsawtooth/src/journal/publisher/batch_pool.rs The validator sometimes returns HTTP/1.1: 429 status when there's a heavy request. So, I want to test by increasing it a little bit. Please help.

RajaramKannan (Fri, 03 Dec 2021 02:49:08 GMT):
@amundson @arsulegai One more instance of a crash that seems to be related to data in an address in LMDB going bad( not sure what to call it - maybe corrupted?). Same sequence, a transaction was sent that basically gets from that address and then updates the values and sets to that address. The next transaction for the same address when it does a get it sees [] instead of the values previously set and then all the nodes crash (from prior logs on block summarizing). The txn seems quite innocuous just does a get/set on that address. I have copied down the LMDB file to see what is going in there and done an mdb_dump, but obviously cant make out what is going on inside :-). I want to see what the actual value stored in there is now if any in that address. 1. Any further hypothesis on what may be going wrong? 2. Any recommendations on how I can read the LMDB file to see what is there in that address now ?

amundson (Tue, 14 Dec 2021 14:29:32 GMT):
@RajaramKannan you can also get state out of the REST API https://sawtooth.hyperledger.org/docs/core/releases/1.2/rest_api/endpoint_specs.html - GET /state/{address}

RajaramKannan (Wed, 15 Dec 2021 04:23:27 GMT):
@amundson thanks, we already did that and it shows []. The state previously (we backup LMDB daily) had values and was never deleted but just set/updated to new values. Between that and the fact that any operation on that state is causing all our nodes to crash, prompted me to see if we can explore the LMDB file externally as well...

amundson (Tue, 21 Dec 2021 15:23:59 GMT):
@RajaramKannan at a high-level, the lmdb database is key,value where key is the hash of a node of a tree and the value is a cbor-encoded struct of the data at that node of the tree. if you decoded the cbor-encoded value, one of the values of the struct will be the bytes set by the smart contract. if you wanted to read the database, you have to start at the root of the tree and work your way down; to do that you use the state root hash stored within a block.

RajaramKannan (Wed, 22 Dec 2021 05:16:17 GMT):
thanks @amundson Will let you know if we find anything.

Dan (Thu, 03 Feb 2022 15:00:11 GMT):
can we get some reviews on: https://github.com/hyperledger/sawtooth-sdk-python/pull/30

agunde (Thu, 03 Feb 2022 16:32:10 GMT):
Thanks @Dan PR has been merged.

Dan (Thu, 03 Feb 2022 16:33:06 GMT):
muchas gracias :)

saheli_c (Mon, 07 Feb 2022 07:34:51 GMT):
Has joined the channel.

saheli_c (Mon, 07 Feb 2022 07:34:51 GMT):
In Sawtooth core we are planning to implement some modifications to the parallel scheduler implementation. The original scheduler has been developed in python. But we are observing that in the newer versions of the Sawtooth core, all the modules are being implemented in Rust. So will it be better to implement the modifications to the scheduler in Rust and invoking the Rust modules from python or to develop these modules in python itself? Any advice on this is really appreciated.

agunde (Mon, 07 Feb 2022 15:01:31 GMT):
Sawtooth is being rewritten in Rust and in Sawtooth 2.0 the scheduler implementation will come from Transact (There is not currently a finished parallel scheduler in Transact). However 2.0 is still a ways out. So if you are planning to use sawtooth 1.2 your modifications should be written in Python.

agunde (Tue, 15 Mar 2022 17:33:26 GMT):
Reminder that the next Sawtooth Contributor Session is Friday at 10:30am US/Central. The zoom link is https://us02web.zoom.us/j/89262216179?pwd=UzFyK2t6MmxGTkxwR2dydnBkaUtDQT09 and the agenda/working doc is at https://docs.google.com/document/d/1WlF8UfoKAEydJobHWDvz-tMpWqQSTlChoFv50fHX1tM/edit# -- feel free to add agenda items if you have them

agunde (Tue, 15 Mar 2022 17:34:11 GMT):
Reminder has also been posted in the new discord "sawtooth-contributors" channel

agunde (Tue, 15 Mar 2022 17:34:11 GMT):
Reminder has also been posted in the new Hyperledger discord "sawtooth-contributors" channel

rjones (Wed, 23 Mar 2022 17:35:56 GMT):

rjones (Wed, 23 Mar 2022 17:35:56 GMT):