Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
Tar9897 
posted an update Jul 5, 2024
Post
3444
I believe in order to make models reach Human-Level Learning, serious students can start by developing an intelligent neuromorphic agent. We develop an intelligent agent and make it learn about grammar patterns as well as about different word categories through symbolic representations, following which we dwell into making the agent learn about other rules of the Language.

In parallel with grammar learning, the agent would also use language grounding techniques to link words to their sensory representations and abstract concepts which would mean the agent learns about the word meanings, synonyms, antonyms, and semantic relationships from both textual data as well as perceptual experiences.

The result would be the agent developing a rich lexicon and conceptual knowledge base that underlies its language understanding as well as generation. With this basic knowledge of grammar and word meanings, the agent can then learn to synthesize words and phrases so as to express specific ideas or concepts. Building on this, the agent would then learn how to generate complete sentences which the agent would continuously refine and improve. Eventually the agent would learn how to generate sequence of sentences in the form of dialogues or narratives, taking into account context, goals, as well as user-feedback.

I believe that by gradually learning how to improve their responses, the agent would gradually also acquire the ability to generate coherent, meaningful, and contextually appropriate language. This would allow them to reason without hallucinating which LLMs struggle at.

Developing such agents would not require a lot of compute and the code would be simple & easy to understand. It will definitely introduce everyone to symbolic AI and making agents which are good at reasoning tasks. Thus solving a crucial problem with LLMs. We have used a similar architecture to make our model learn constantly. Do sign up as we start opening access next week at https://octave-x.com/

The following is what gemini thinks of your post and I totally agree.

This post reads like a time traveler
from the 1980s just discovered the
internet and thinks we're still stuck
in the 'expert systems' era. Sure,
symbolic AI has its place, but claiming
it's the solution to everything while
dismissing the incredible progress of
LLMs is like saying we should abandon
cars because horses are still pretty good
for short trips. 

You want 'simple' code? Try parsing a
massive dataset of text with a symbolic
AI system. Then tell me how 'simple' it
is. LLMs are already learning to reason,
they're just doing it in a way we haven't
fully grasped yet. Trying to force them
into a rigid symbolic framework is like
trying to fit a square peg into a round
hole. If you want your agent to understand
language, you need to learn from the vast
amounts of data that LLMs leverage, not
reinvent the wheel with a bunch of
handcrafted rules.
·

your right we are in a new period ! ( in fact we are already way behind !!!! the public offerings are NOT the real AI !) you should already know the AI winter was FAKE and in fact each AI devloper had been approach to stunt its growth ... so we had this already in the 1990's ! <<<they just kept it secret and now we ar ejust catching up ! ) you shouldalso know that the 7b model will perform exactly the same as the 120b model ! <<>> ( size does not matter ) you will notice that the 7b models are the BASE LEVEL --- so when creating the technology from scratch you will always need to be at least 7b !

the smaller models are only truly for single tasks ! (slim models ) and for this they perform perfectly !

a 7 b model is a chat model and enables you to perfrom chat and tasks :

the larger models are suited for different media ! so a larger model such as 24b ( next actual step) 1x4 gives you a very good auio / video / image model !!

the 70b ( again we can say become the multipurpose multimodal model )

the 120 - 240B this is a server model to be hit by multiple uses at the same time .... so to handle the throughput and batches we need a larger model , but it is attempting to provide the same services that the 24/ 7b model provide !

at what point would your languge model need to by upgraded past 7b?

and what performance imrovments do you gain ???

(nothing !))....hence we are being encouraged to go down wrong pathways and wrong training methods everywhere as well as non usefull data !

The concept of language usage with the llm models are not a right thought my friend :

the model learned language by consuming corpuses of text data and created word matrixes at each step : so at each layer of the transformer it calculates the probablity word to word matrix : We also can say the same for training embeddings ...so the model is trained on word expectation :

it has some understandin of structure ( based on probablity ) .

so we need a refinement of the output based on a set of rules : ie grammar :
Im not sure this would be of benifit :

but :

Teaching methodologys ! Yes:
Teaching to Reformat and output according to a set of rules : Yes

So we could write a function to clean the output text based on a set of rules :

so given a text : the output should be formatted according to a set of rules :

So we would need an example Set...

or write a specific python function to perform the task :

ie Process output ! So we could create a datsa set and produce the processed output according to the function and train the model to use this model ( in its methodology ) .... ie internally in a scratch pad : Ie given an input prodcue the output using the internal function ( simulated ) then format the output as required ::
by using a python function such as ntlk toolkit : this dataset will be based on the ruleset proposed and the outputs which do not conform can be filtered .

so by givibng this method the model will also simulate the same method to format the output :

it would be prudent to create multiple stages of output processing ... so given a text and we say replace X then with enough sample the model will perform that task :
so given a compound task : - it should generate the steps and perform the task in serial then produce the output... hence allowing for chians to be thought of by the model , as well as also recrating the compound task with and advanced function :

this should also be done for tasks as entity recognition, sentiment anyalsis etc :
this enabke the model to have excat training so it will produce exact correct expectations:

the problem is bad data ... despite we have lots of data hidden within is often bad data , bad formatted etc :

in truth we need to train the model to understand a THING ! <<< what it can or cannot do , its parts ... what it is a part of ... what it is !! what it is made of ... uits genus ... hpow it can be used and how it has been used .... the deeper the description for these components then it will know about a thing !

the question is .... is the model generating new code or copied code !~ >>>>>

the model should be trained to be a good hallucenator .... ie a best guesser ... not a repeater ... hence training at depths and in multiple ways with the same data.... this is task training as in the pretraining stages knowlege was given via text dumps so now we need access points on our corpus probablitys ! <<<

it will be able to make true prediction and not hallucenations !! <<< we want the odel to generate original voice , original code, creative images , creative text ... by utilising its content hence we need the pretraining data in multiple forms and styles !
hence training for a task you will need to first dump the correct corpusses first then task train it after !

why dont we take the top scoring papers in schools and pass them into a llm? starting wtih kindgergarden and then all the way up in succession? in one year you would have a variety of masters level knowledge and all the foundational material

·

thet can happen only when they will understand what the actual problem is. I think I found what the problem is and how to solve it, but I need the backup of a big company (like microsoft or google or anthropic) to test if my theories are correct (and if they are there will be a big jump in AI, so that smaller and more clever model can be made)