352 lines
15 KiB
Org Mode
352 lines
15 KiB
Org Mode
* Algorithm
|
|
|
|
Background reading: [[https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type][CRDT]]
|
|
|
|
This packages implements the Logoot split algorithm
|
|
~André, Luc, et al. "Supporting adaptable granularity of changes for massive-scale collaborative editing." 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing. IEEE, 2013.~
|
|
|
|
The CRDT-ID blocks are implemented by text property ='crdt-id=.
|
|
A continous range of text with the same ='crdt-id= property represent a CRDT-ID block.
|
|
The ='crdt-id= is a a cons of =(ID-STRING . END-OF-BLOCK-P)=,
|
|
where =ID-STRING= represent the CRDT-ID of the leftmost character in the block.
|
|
If =END-OF-BLOCK-P= is =NIL=, the block is a non-rightmost segment splitted from a larger block,
|
|
so insertion at the right of this block shouldn't be merged into the block by sharing the base CRDT-ID and increasing offset.
|
|
|
|
=ID-STRING= is a unibyte string representing a CRDT-ID (for efficient comparison).
|
|
Every two bytes represent a big endian encoded integer.
|
|
For base IDs, last two bytes are always representing Site ID.
|
|
Stored strings are BASE-ID:OFFSETs. So the last two bytes represent offset,
|
|
and second last two bytes represent Site ID.
|
|
|
|
* Protocol
|
|
|
|
Text-based version
|
|
(it should be easy to migrate to a binary version. Using text for better debugging for now)
|
|
|
|
Note: Starting from =v0.3.0=, we separate /User IDs/ and /Site IDs/.
|
|
Site IDs are /buffer local/ and temporarily assigned to users with writable access.
|
|
|
|
Every message takes the form =(type . body)=
|
|
|
|
- Text Editing
|
|
A peer must obtain a =site-id= before performing the following operations,
|
|
by remote calling =crdt-get-write-access=. See [[Remote Function]].
|
|
+ insert ::
|
|
body takes the form =(buffer-name user-id crdt-id position-hint content)=
|
|
- =position-hint= is the buffer position where the operation happens at the site
|
|
which generates the operation. Then we can play the trick that start search
|
|
near this position at other sites to speedup CRDT ID search
|
|
- =content= is the string to be inserted
|
|
|
|
+ delete ::
|
|
body takes the form =(buffer-name user-id position-hint . crdt-id-list)=
|
|
- =crdt-id-list= is generated from =CRDT--DUMP-IDS= from the deleted text
|
|
|
|
+ cursor ::
|
|
body takes the form
|
|
=(buffer-name user-id point-position-hint point-crdt-id mark-position-hint mark-crdt-id)=
|
|
=*-crdt-id= can be either a CRDT ID, or
|
|
- =nil=, which means clear the point/mark
|
|
- =""=, which means =(point-max)=
|
|
|
|
- Contact information
|
|
|
|
+ contact ::
|
|
body takes the form =(user-id slot value)=
|
|
- =slot= can be one of
|
|
#+BEGIN_SRC emacs-lisp
|
|
name host service focus
|
|
#+END_SRC
|
|
|
|
+ leave ::
|
|
body takes the form =(user-id)=
|
|
|
|
This message is sometime sent from client to server to indicate disconnection,
|
|
if the underlying proxy doesn't indicate disconnection properly.
|
|
|
|
- Login
|
|
+ hello ::
|
|
This message is sent from client to server, when a client connect to the server.
|
|
body takes the form =(protocol-version &optional response)=
|
|
|
|
+ challenge ::
|
|
body takes the form =(salt)=
|
|
|
|
+ login ::
|
|
It's always sent after server receives a hello message.
|
|
Assigns a User ID to the client
|
|
body takes the form =(user-id)=.
|
|
|
|
- Initial Synchronization
|
|
+ sync ::
|
|
This message is sent from server to client to get it sync to the state on the server.
|
|
Might be used for other optimization in the future.
|
|
One optimization I have in mind is let server try to merge all CRDT item into a single
|
|
one and try to synchronize this state to clients at best effort.
|
|
body takes the form =(buffer-name . crdt-id-list)=
|
|
- =crdt-id-list= is generated from =CRDT--DUMP-IDS=
|
|
|
|
+ ready ::
|
|
body takes the form =(buffer-name major-mode-symbol)=
|
|
Indicates the end of a batch of synchronization messages
|
|
(which usually contains some =cursor= messages, a =sync= message,
|
|
and some =overlay-*= messages).
|
|
The client should now try to enable =major-mode-symbol= in the
|
|
synchronized buffer.
|
|
|
|
- Error Recovery
|
|
Note: when a client side error happens, it just sends a =get= message and
|
|
follow initial synchronization procedure to reinitialize the buffer.
|
|
|
|
+ error ::
|
|
body takes the form =(buffer-name error-symbol . error-datum)=.
|
|
This message is sent from server to client to notice that some messages from the
|
|
client is not processed due to error =(error-symbol . error-datum)=.
|
|
Normally client should follow initial synchronization procedure to reinitialize the buffer.
|
|
- =buffer-name= can also be =nil=, which signifies that it's a session error.
|
|
The only reasonable thing to do is to disconnect in this scenario.
|
|
Currently, this happens when client/server protocol version doesn't match.
|
|
|
|
- Buffer Service
|
|
+ add ::
|
|
Indicates that the server has started sharing some buffers.
|
|
body takes the form =buffer-name-list=
|
|
|
|
+ remove ::
|
|
Indicates that the server has stopped sharing some buffers.
|
|
body takes the form =buffer-name-list=
|
|
|
|
+ get ::
|
|
Request the server to resend =sync= message for a buffer.
|
|
body takes the form =(buffer-name)=
|
|
|
|
- Overlay Synchronization
|
|
+ overlay-add ::
|
|
body takes the form
|
|
#+BEGIN_SRC
|
|
(buffer-name user-id logical-clock species
|
|
front-advance rear-advance
|
|
start-position-hint start-crdt-id
|
|
end-position-hint end-crdt-id)
|
|
#+END_SRC
|
|
|
|
+ overlay-move ::
|
|
body takes the form
|
|
#+BEGIN_SRC
|
|
(buffer-name user-id logical-clock
|
|
start-position-hint start-crdt-id
|
|
end-position-hint end-crdt-id)
|
|
#+END_SRC
|
|
|
|
+ overlay-put ::
|
|
body takes the form =(buffer-name user-id logical-clock prop value)=
|
|
|
|
+ overlay-remove ::
|
|
body takes the form =(buffer-name user-id logical-clock)=
|
|
|
|
- <<Remote Function>>
|
|
+ fcap ::
|
|
body takes the form =(fcap-symbol nonce in-states out-states . interactive-form)=
|
|
This grants a "functional capability" to a peer.
|
|
Nonce is a random number to prevent forging capability.
|
|
- =in-states= is a list of state symbols that the function depends on.
|
|
=out-states= is a list of state symbols that the function modifies and should be synchronized
|
|
to the caller.
|
|
See [[Allowed state symbols]].
|
|
|
|
+ funcall ::
|
|
body takes the form
|
|
#+BEGIN_SRC
|
|
(user-id logical-clock spawn-user-id
|
|
state-list nonce fcap-symbol . args)
|
|
#+END_SRC
|
|
- =spawn-user-id= represents the site where the interactive command is originally invoked
|
|
+ It can be different from =user-id= because a remote function can call a remote function!
|
|
This is especially useful when client makes a remote call,
|
|
but the call on the server request some interactive input,
|
|
and such interactive call are remote-called back into the client.
|
|
- =state-list= is an alist of bindings.
|
|
(except that we use 1 element list for the CDRs, to save a dot in the serialized string)
|
|
(CDRs can also be 2 element list of the form =(crdt-id pos-hint)=)
|
|
<<Allowed state symbols>> are
|
|
#+BEGIN_SRC
|
|
window window-point buffer buffer-content point
|
|
mark mark-active transient-mark-mode last-command-event
|
|
#+END_SRC
|
|
|
|
+ return ::
|
|
body takes the form =(user-id logical-clock state-list success-p . return-values)=
|
|
|
|
- Buffer local variables
|
|
+ var :: body takes the form =(buffer-name variable-symbol . args)=
|
|
=args= is passed to the variable receiver =(get variable-symbol 'crdt-variable-receiver)=
|
|
to calculate an updated value.
|
|
The actual format of =args= depends on the variable sender and receiver
|
|
(which supposed implement some CRDT).
|
|
|
|
All peer must make sure they install the same kind of variable sender and receiver
|
|
for =variable-symbol=.
|
|
|
|
- Remote Buffer Process
|
|
+ process ::
|
|
body takes the form =(buffer-name string)=
|
|
Sent from client to server, request sending =string=
|
|
to the process buffer associated to =buffer-name=.
|
|
|
|
+ process-mark ::
|
|
body takes the form =(buffer-name crdt-id position-hint)=.
|
|
|
|
NOTE: for =overlay-put=, =overlay-move= and =process-mark=, server must also broadcast the message
|
|
*back to the client that generated it*, to ensure consistent global history.
|
|
|
|
* Emacs as a collaborative operating system
|
|
|
|
The goal: With a few annotations, developer should be able to make any Emacs application
|
|
collaboration-powered. Emacs should be one of the most powerful collaboration platforms.
|
|
|
|
How: There're plenty of Emacs applications centered around the buffer and buffer-local-variables.
|
|
By implementing synchronization primitives for all components in a buffer,
|
|
pretty much everything can be made collaborative.
|
|
Synchronize arbitrary buffer-local-variable reasonably is hard, but user annotations can help.
|
|
|
|
** How to implement collaboration support for a package
|
|
|
|
~crdt.el~ provides two sets of facilities for adding collaboration support, a command-based one and a state-based one.
|
|
Package hackers are free to combine them to provide desired behavior.
|
|
|
|
*** Command-based collaboration
|
|
|
|
This is a simple method to add collaboration support.
|
|
After registering a command with =crdt-register-remote-command=,
|
|
an =:around= advice is added such that when a client invoke this command,
|
|
an request is sent to the server instead of running the command locally.
|
|
|
|
Hackers must make sure that they declare what sets of buffer state the command uses
|
|
to fully preserve user intent.
|
|
|
|
Although relatively simple, collaboration command implemented using this method
|
|
must go through a round trip to the server and will incur latency.
|
|
|
|
**** Why we need used-state-set annotations
|
|
|
|
Suppose Alyssa P. Hacker does =(crdt-register-remote-command 'eval-last-sexp)=,
|
|
but didn't declare that =eval-last-sexp= uses content of the buffer.
|
|
Now the hackers are conspiring in an ~crdt.el~ session.
|
|
Ben Bitdiddle places cursor after =(+ 1 1)= and run =eval-last-sexp=.
|
|
However, the moment Ben Bitdiddle's request arrives at the server,
|
|
Cy D. Fect has changed =(+ 1 1)= to =(+ 1 2)= (their message arrives first!).
|
|
Now the server does what it sees and return =3=, instead of =2=.
|
|
|
|
The correct solution is to let the server roll-back to the state when Ben Bitdiddle invoked the command.
|
|
It is relatively expensive thus we don't want to do this for every command,
|
|
thus we require package hackers to annotate explicitly.
|
|
|
|
/The above mechanism haven't been implemented yet!/
|
|
But adding annotations now will help adding it in the future.
|
|
To implement this mechanism we need to add lamport timestamp to every messages
|
|
(which may corresponds to mutation of interesting states),
|
|
and send a vector clock in =command= messages which depend on buffer content.
|
|
|
|
*** State-based collaboration
|
|
|
|
We can also synchronize the underlying state of the packages
|
|
rather than proxying user-level commands.
|
|
If there're good CRDT candidates to be used for the state
|
|
(hackers need to understand what concurrency semantics their state need to have!),
|
|
then the commands can have real-time effect without needing to be acknowledged from the server.
|
|
|
|
=crdt-org-sync-overlay-mode= is an example of this approach.
|
|
|
|
Overall, this method is much more complicated than command-base method.
|
|
Development of the facility is still on-going.
|
|
|
|
** TODO Task list for ~crdt.el~ facility
|
|
- [X] synchronize buffer text (insert/delete)
|
|
- [X] synchronize overlays
|
|
- [-] synchronize major/minor modes
|
|
+ [X] initial synchronization of major modes
|
|
+ [ ] toggle minor modes on the fly
|
|
+ [X] change major modes on the fly
|
|
- [-] set of synchronization primitives for buffer local variables
|
|
+ [-] server dictated
|
|
+ [ ] non incremental
|
|
+ [X] naive incremental
|
|
+ [ ] state-of-the-art level tree diff
|
|
+ [ ] a library of CRDTs
|
|
- [X] synchronize text properties (any use case for this?)
|
|
+ [X] synchronize when new text is inserted
|
|
+ [X] synchronize when changed
|
|
- [ ] synchronize markers (any use case for this?)
|
|
- [-] remote command
|
|
+ [X] basic remote command (only possibly use =(point)=)
|
|
+ [X] command that uses region
|
|
+ [ ] correctly handle command that uses buffer content
|
|
+ [ ] handle arbitrary =interactive= form (firstly, what's the right thing to do?)
|
|
- [-] remote buffer process
|
|
+ [X] process mark
|
|
+ [X] send to process
|
|
+ [ ] make sure "pseudo process" really looks like process
|
|
(define complete set of advices)
|
|
|
|
** Notes and examples of CRDTize built-in packages
|
|
|
|
Search for =;;; Built-in package integrations= in ~crdt.el~
|
|
|
|
* TODO Cross-editor support
|
|
|
|
The current plan is to reuse the Emacs implementation as a local server for any other editor, aka Emacs as a service.
|
|
The benefit is that we don't need to reimplement the sophiscated CRDT algorithm in other +uncivilized+ environments.
|
|
We then just need to design a thin protocol that communicate between local Emacs and the other editor.
|
|
Since this protocol communicate only locally, the latency should be negligible,
|
|
therefore we use a blocking reader/writer lock based synchronization scheme.
|
|
|
|
** Lock: modes of operations
|
|
|
|
It turns out that I vastly over-estimated the extensibility of /The Other Editors/.
|
|
For example, lots of them (including M$ vScoDe and cult 666) doesn't seem to have anything like =pre-command-hook=,
|
|
making it impossible to implement a usual bidirectional locking mechanism
|
|
(because we can't tell those editors to acquire lock from Emacs before running commands that potentially modify the buffer).
|
|
|
|
Currently I implemneted a hack that by default let /The Other Editors/ hold the lock, but upon receiving
|
|
an =acquire= from Emacs, let /The Other Editors/ dead loops to hopefully halt command execution until Emacs gives back the lock.
|
|
Emacs thus must give back lock as soon as possible to un-hang /The Other Editors/.
|
|
|
|
Q: What if Emacs GCs?
|
|
/Q got thrown out of the window./
|
|
|
|
** Bridge protocol
|
|
|
|
- Reader/writer lock
|
|
+ aquire :: body takes the form =()=
|
|
+ release :: body takes the form =()=
|
|
|
|
The rest is mostly analogue to the primary protocol for Emacsen,
|
|
except that CRDT IDs are replaced by explicit integer position (start from 1, as in Emacs).
|
|
|
|
- Text Editing
|
|
+ insert :: body takes the form =(buffer-name position content)=
|
|
+ delete :: body takes the form =(buffer-name position length)=
|
|
|
|
- Peer State
|
|
+ cursor :: body takes the form =(buffer-name user-id point-position mark-position)=
|
|
=*-position= can be either an integer, or
|
|
- =nil=, which means clear the point/mark
|
|
|
|
+ contact :: same as primary protocol.
|
|
+ leave :: same as primary protocol.
|
|
|
|
- Login
|
|
Note that we don't include challenge/response authentication mecahnism.
|
|
|
|
+ hello :: same as primary protocol.
|
|
+ login :: same as primary protocol.
|
|
|
|
- Initial Synchronization
|
|
+ sync :: body takes the form =(buffer-name content-string)=
|
|
+ ready :: same as primary protocol.
|
|
|
|
- Buffer Service
|
|
+ add :: same as primary protocol.
|
|
+ remove :: same as primary protocol.
|
|
+ get :: same as primary protocol.
|